return to table of content

pg_timeseries: Open-source time-series extension for PostgreSQL

riedel
17 replies
21h20m

You may already be asking: “why not just power the stack using TimescaleDB?” The Timescale License would restrict our use of features such as compression, incremental materialized views, and bottomless storage. With these missing, we felt that what remained would not provide an adequate basis for our customers’ time-series needs. Therefore, we decided to build our own PostgreSQL-licensed extension.

Have been using the free version timescaledb before to shard a 500 Million observation time series database. Worked drop-in without much hassle. Would have expected some benchmarks and comparisons in the post. I will for sure watch this...

osigurdson
15 replies
17h7m

500 million is very little however. A regular table with a covering index would probably be fine for many use cases with this number of points.

vegabook
12 replies
17h3m

indeed. Financial timeseries I was working with over 100 million new points, _per day_. For anything serious TimescaleDB is essentially not open source. Well done tembo.io crew -- will definitely give this a whirl.

NineStarPoint
5 replies
8h34m

And as I understand that license, you are allowed to use Timescale for anything that doesn’t involve offering Timescale itself as a service. If you were using Timescale to process lots of time series transactions in your backend, it doesn’t seem to me like that would break the license.

(Which is to say that if, like Tembo, you’re offering Postgres as a service you do indeed have a problem. But for other use, should be fine)

wasmitnetzen
2 replies
8h4m

The license doesn't allow you to "give access to, directly or indirectly (e.g., via a wrapper) to [SQL]".

Legally, what's a wrapper? Is a REST API a wrapper?

belk
0 replies
6h54m

I imagine legally would need a lawsuit to set a precedence, and if a license owner sets an over-reaching precedence of what a wrapper is, they risk losing customer trust and companies avoiding them like the plague.

e.g. timescaledb going after a tsdb as a service company offering tsdb behind a graphql wrapper vs timescaledb going after a financial company offering timeseries data collection and viewing.

I think a good border test would be, would timescaledb allow you to offer a metrics and logging service? technically you're offering timeseries database functionality, but it's in a constrained domain, and very clearly a different product, but still effectively CRUDing timeseries data.

NineStarPoint
0 replies
5h39m

That’s the internal use restriction. There is also the restriction more relevant to the use cases I’m talking about on Value Added Products which is “the customer is prohibited, either contractually or technically, from defining, redefining, or modifying the database schema or other structural aspects of database objects”.

Which is, basically, saying that you can do anything that doesn’t give your customers the ability to redefine and modify the database schema as long as you are creating a product that is adding value on top of timescale. Is any of this 100% clear? Not any more that legalese generally is, and of course probably wise to talk to a lawyer if you’re concerned about it. Timescale has made their intent with the license clear in the past with blog posts and such though.

dig1
1 replies
8h8m

The tricky thing with these licenses (BSL, SSPL, etc.) is that you can use them freely for internal stuff, but suddenly, if you make your product public (assuming it uses, e.g., TimescaleDB), things can get muddy. Everyone wants the flexibility to either open-source or commercialize a successful internal product in the future.

The problem is that, even if your app is not a mere frontend for TimescaleDB/Mongo/Redis, you can get sued, and you'll have to spend unnecessary time and money proving things in court. No one wants this, especially a startup owner whose money and time are tight. Also, even if your startup/company uses some of these techs, potential company buyers will be very wary of the purchase if they know they'll have to deal with this later.

miohtama
0 replies
6h33m

I would assume TimescaleDb only sues if you money. In this case you can also afford a commercial license. If you hit big just contact them and tell there was a problem having a correct license earlier and you want to fix the situation.

There is 0% chance Timescale would sue mom’n’pop operation for breaking their license.

miohtama
3 replies
6h34m

If you have 100 million points per day it’s likely you afford to pay any commercial license.

pgwhalen
2 replies
6h19m

Why would the number of data points correlate to budget? Perhaps there’s a chance if the business scales with paying users, but that’s unlikely to be true in finance.

walr000s
1 replies
6h2m

At that number of observations, I would assume depth of market data so probably HFT use case. HFT is notoriously expensive to try to compete in

moogly
0 replies
4h51m

Or IoT data, which is notoriously hard to make money on.

vjerancrnjak
1 replies
16h30m

I think you’re not talking about the same thing. There’s an expression related to time series data —- “high churn” and another “active time series”.

500 million active time series is extremely huge.

It does not have anything to do with number of data points.

Good time series databases can scale to 1M-10M writes per second without a hiccup.

osigurdson
0 replies
14h16m

I suppose it means by what is meant by an "observation". Is that an entire time series for a single property or a single point? Nevertheless, the number of points absolutely matters.

A regular Postgres database can give you 50-100K inserts per second and can scale to at least 1B rows with 100K+ individual series without much difficultly. If you know you will need less (or much less) than this, my suggestion is to use a regular table with a covering index. If you need more, use ClickHouse.

rapsey
0 replies
15h22m

Databases are a tough business. You're just waiting for open source to eat your lunch.

MuffinFlavored
10 replies
19h16m

Dumb question: why can't I just insert a bunch of rows with a timestamp column and indices? Where does that fall short? At a certain # of rows or something?

What does this let me do that can't be achieved with "regular PostgreSQL without the extension"?

gonzo41
6 replies
18h54m

Time based partitioning.

MuffinFlavored
5 replies
17h53m

    CREATE TABLE logs (
        id SERIAL PRIMARY KEY,
        log_time TIMESTAMP NOT NULL,
        message TEXT
    ) PARTITION BY RANGE (log_time);
Why won't this work on stock PostgreSQL?

kbolino
1 replies
3h44m

I think what's meant here is windowing (partitioning the query) not partitioning the table per se. Though even with this strategy, you must manually create new partitions all the time.

This also isn't typical time-series data, which generally stores numbers. Supposing you had a column "value INTEGER" as well, how do you do something like the following (pseudo-SQL)?

    SELECT AVG(value) AS avg_value FROM logs GROUP BY INTERVAL '5m'
Which should output rows like the following, even if the data were reported much more frequently than every 5 minutes:

    log_time             | avg_value
    2024-05-20T00:00:00Z | 10.3
    2024-05-20T00:05:00Z | 7.8
    2024-05-20T00:10:00Z | 16.1

ants_a
0 replies
41m

    SELECT date_bin(interval '5 minutes', log_time) log_time,
           AVG(value) avg_value
    FROM logs GROUP BY 1

gonzo41
1 replies
16h49m

read the docs, it's not saying that won't work. This extension along with timescale just makes some things more ergonomic.

darkstar_16
0 replies
9h37m

And provides helper functions to deal with regular time series tasks

jmaker
0 replies
12h16m

That won’t work already because your timestamp isn’t part of your primary key.

skibbityboop
0 replies
17h52m

I'm with you, I need to read up more on where timeseries could benefit, at work we have a PostgreSQL instance with around 27 billion rows in a single partitioned table, partitioned by week. Goes back to January of 2017 and just contains tons of data coming in from sensors. It's not "fast", but also not ridiculously slow to say e.g. "Give me everything for sensor 29380 in March of 2019".

I guess depends on your needs but I do think I need to investigate timeseries more to see if it'd help us.

ishikawa
0 replies
5h33m

there are several good articles explaining this, especially on Timescable blog, but in short, without time partitioning and just index, at some given point the performance for reads and writes degrades exponencially.

citizen_friend
0 replies
17h46m

It will work just fine.

nitinreddy88
9 replies
19h16m

Most of the time-series queries (almost all of them) are aggregated queries. Why not leverage or build top-notch Columnarstore for the same.

Everything seems to be there and why there's not first class product like ClickHouse on PG.

paulryanrogers
6 replies
18h56m

Citus, Persona, TimescaleDB?

nitinreddy88
3 replies
18h52m

Looking at the comparison with Click Benchmark, they are almost pathetic in terms of performance. They cant even handle sub-second aggregation queries for 10M records. Compared that too even duckdb reading from parquet files.

nikita
2 replies
13h31m

Postgres is missing a proper columnstore implementation. It's a big gap and it's not easy to build.

One solution could be integrating duckdb in a similar way as pgvector. You need to map duckdb storage to Postgres storage and reuse duckdb query processor. I believe it's the fastest way to get Postgres to have competitive columnstores.

tarasglek
0 replies
10h1m

This sounds interesting. I don't see duck db as a supported extension or mentioned anywhere in your code yet ;)

Is this foreshadowing?

anentropic
0 replies
9h38m

Hydra?

bloopernova
0 replies
17h33m

That was very "Klaatu, Barada, Nikto".

applied_heat
0 replies
14h2m

Victoria metrics as well, they say based on similar structures used in clickhouse

netik
1 replies
18h45m

The gold standard for this Druid at very large scale, or ClickhouseDB. Clickhouse has a lot of problems as far as modifying/scaling shards after the fact, while Druid handles this with ease (and the penalty of not being able to update after the fact.)

anentropic
0 replies
9h37m

Doris?

gxyt6gfy5t
4 replies
21h27m

How’s it different than timescaledb?

logrot
1 replies
21h25m

You may already be asking: “why not just power the stack using TimescaleDB?” The Timescale License would restrict our use of features such as compression, incremental materialized views, and bottomless storage. With these missing, we felt that what remained would not provide an adequate basis for our customers’ time-series needs. Therefore, we decided to build our own PostgreSQL-licensed extension.
samaysharma
0 replies
21h19m

It's much newer for one, so it's behind on features. But, we're working on adding new ones based on customer demand.

We want to build it with a PostgreSQL license using existing community extensions as much as possible and build custom stuff (still permissively licensed) only when necessary.

vantiro
0 replies
21h22m

Timescaledb's license is more like Redis' new license?

hosh
0 replies
21h17m

The use of postgresql licensing might mean we can see this available for AWS RDS and other managed PostgreSQL providers.

suyash
3 replies
12h5m

Interesting, how does it compare to proper (open source) time series database like InfluxDB other than being 'Postgres' like ?

RedShift1
1 replies
9h28m

InfluxDB is a "proper" time series database?

sofixa
0 replies
6h19m

It is in the sense that it's a purpose built time series database. Once upon a time it was definitely among the best too, but IMO their sales and tech and everything strategies have been just massive clusterfucks. They also have the best metrics agent, by far.

They're on their third serious rewrite with breaking changes for users. They pulled features from the open source version (clustering) to have a higher moat for the Enterprise version. They deleted customer data permanently in their hosted version after sending two emails to billing addresses. Their marketing and product naming also sucks:

InfluxDB InfluxDB Cloud Serverless InfluxDB Cloud Dedicated InfluxDB Clustered

tucnak
0 replies
11h37m

i's not Postgres-like, it _is_ Postgres

matthewmueller
2 replies
13h5m

Would love to use this with RDS!

rywalker
0 replies
3h16m

Tembo CEO here - we are targeting feature parity for Tembo Cloud w/ RDS as soon as possible, would love to have you give Tembo a try sometime, give us feedback :)

Tembo Cloud is standard SaaS offering, and our new Tembo Self Hosted (https://tembo.io/docs/product/software/tembo-self-hosted/ove...) allows you to run the same software that powers our SaaS, but in your own K8s cluster.

anentropic
0 replies
9h36m

Same here, but not holding my breath since all these neat Postgres extensions compete with other AWS DBs like Redshift, Timestream etc

wdb
1 replies
7h51m

Would this be a good extension when you want to load balancer log entries (status, response body, headers etc)?

I think a columnar database store would be more efficient than normal row-based databases? load balancer log entries could be considered something similar to analytics events.

samaysharma
0 replies
2h40m

Yes. Columnar is integrated with pg_timeseries already.

plainOldText
1 replies
19h58m

Your site is very well designed and easy to read btw, and the app UI looks great from the demo photos. I might try it!

samaysharma
0 replies
19h18m

Thank you!

vivzkestrel
0 replies
15h39m

Benchmarks with respect to QuestDB, TimescaleDB?

vantiro
0 replies
21h28m

PostgreSQL licensed, good move!

valenterry
0 replies
6h49m

It's about time that postgres (and other databases) add native append-only tables. That doesn't make it timeseries, but it probably helps with the standardiziation and all the logic/access around it.

ramoneguru
0 replies
20h29m

How does this stack up against something like what QuestDB offers?

nhourcard
0 replies
7h15m

Interesting release, it feels that the time-series database landscape is evolving toward:

a) columnar store & built from scratch, with convergence toward open formats such as parquet & arrow: influxdb 3.0, questdb

b) Adding time-series capabilities on top of Postgres: timescale, pg_timeseries

c) platforms focused on observability around the Prometheus ecosystem: grafana, victoria metrics, chronosphere

mnahkies
0 replies
6h50m

Looking at their roadmap, the killer feature for me would be incremental materialised views

Incremental view maintenance — define views which stay up-to-date with incoming data without the performance hit of a REFRESH

I wonder if they plan to incorporate something like https://github.com/sraoss/pg_ivm or write their own implementation.

(Although I'm hopeful that one day we see ivm land in postgres core)

mathfailure
0 replies
20h47m

Thank you for posting it: I followed the links and found out about trunk and https://pgt.dev/

PeterZaitsev
0 replies
19h19m

Great to see this kind of innovation. PostgreSQL is interesting while "core" was always Open Source and using very permissive Open Source library, there have been many proprietary and source available extensions, ranging from replication to time series support.

Now we see those Proprietary extensions being disrupted by proper Open Source!