You may already be asking: “why not just power the stack using TimescaleDB?” The Timescale License would restrict our use of features such as compression, incremental materialized views, and bottomless storage. With these missing, we felt that what remained would not provide an adequate basis for our customers’ time-series needs. Therefore, we decided to build our own PostgreSQL-licensed extension.
Have been using the free version timescaledb before to shard a 500 Million observation time series database. Worked drop-in without much hassle. Would have expected some benchmarks and comparisons in the post. I will for sure watch this...
500 million is very little however. A regular table with a covering index would probably be fine for many use cases with this number of points.
indeed. Financial timeseries I was working with over 100 million new points, _per day_. For anything serious TimescaleDB is essentially not open source. Well done tembo.io crew -- will definitely give this a whirl.
What do you mean by "for anything serious it isn't open source"? I didn't see any red flags in the apache variant of timescale, just constant pleading to try their hosted option.
https://github.com/timescale/timescaledb/blob/main/LICENSE-A...
Compression and other features use the non-Apache license:
https://github.com/timescale/timescaledb/tree/main/tsl
And as I understand that license, you are allowed to use Timescale for anything that doesn’t involve offering Timescale itself as a service. If you were using Timescale to process lots of time series transactions in your backend, it doesn’t seem to me like that would break the license.
(Which is to say that if, like Tembo, you’re offering Postgres as a service you do indeed have a problem. But for other use, should be fine)
The license doesn't allow you to "give access to, directly or indirectly (e.g., via a wrapper) to [SQL]".
Legally, what's a wrapper? Is a REST API a wrapper?
I imagine legally would need a lawsuit to set a precedence, and if a license owner sets an over-reaching precedence of what a wrapper is, they risk losing customer trust and companies avoiding them like the plague.
e.g. timescaledb going after a tsdb as a service company offering tsdb behind a graphql wrapper vs timescaledb going after a financial company offering timeseries data collection and viewing.
I think a good border test would be, would timescaledb allow you to offer a metrics and logging service? technically you're offering timeseries database functionality, but it's in a constrained domain, and very clearly a different product, but still effectively CRUDing timeseries data.
That’s the internal use restriction. There is also the restriction more relevant to the use cases I’m talking about on Value Added Products which is “the customer is prohibited, either contractually or technically, from defining, redefining, or modifying the database schema or other structural aspects of database objects”.
Which is, basically, saying that you can do anything that doesn’t give your customers the ability to redefine and modify the database schema as long as you are creating a product that is adding value on top of timescale. Is any of this 100% clear? Not any more that legalese generally is, and of course probably wise to talk to a lawyer if you’re concerned about it. Timescale has made their intent with the license clear in the past with blog posts and such though.
The tricky thing with these licenses (BSL, SSPL, etc.) is that you can use them freely for internal stuff, but suddenly, if you make your product public (assuming it uses, e.g., TimescaleDB), things can get muddy. Everyone wants the flexibility to either open-source or commercialize a successful internal product in the future.
The problem is that, even if your app is not a mere frontend for TimescaleDB/Mongo/Redis, you can get sued, and you'll have to spend unnecessary time and money proving things in court. No one wants this, especially a startup owner whose money and time are tight. Also, even if your startup/company uses some of these techs, potential company buyers will be very wary of the purchase if they know they'll have to deal with this later.
I would assume TimescaleDb only sues if you money. In this case you can also afford a commercial license. If you hit big just contact them and tell there was a problem having a correct license earlier and you want to fix the situation.
There is 0% chance Timescale would sue mom’n’pop operation for breaking their license.
If you have 100 million points per day it’s likely you afford to pay any commercial license.
Why would the number of data points correlate to budget? Perhaps there’s a chance if the business scales with paying users, but that’s unlikely to be true in finance.
At that number of observations, I would assume depth of market data so probably HFT use case. HFT is notoriously expensive to try to compete in
Or IoT data, which is notoriously hard to make money on.
I think you’re not talking about the same thing. There’s an expression related to time series data —- “high churn” and another “active time series”.
500 million active time series is extremely huge.
It does not have anything to do with number of data points.
Good time series databases can scale to 1M-10M writes per second without a hiccup.
I suppose it means by what is meant by an "observation". Is that an entire time series for a single property or a single point? Nevertheless, the number of points absolutely matters.
A regular Postgres database can give you 50-100K inserts per second and can scale to at least 1B rows with 100K+ individual series without much difficultly. If you know you will need less (or much less) than this, my suggestion is to use a regular table with a covering index. If you need more, use ClickHouse.
Databases are a tough business. You're just waiting for open source to eat your lunch.