I'm curious why we seem convinced that this is a task that is possible or something worthy of investigation.
I've worked on language models since 2018, even then it was obvious why language was a useful and transferable task. I do not at all feel the same way about general univariate time series that could have any underlying process.
Time series data are inherently context sensitive, unlike natural languages which follow predictable grammar patterns. The patterns in time series data vary based on context. For example, flight data often show seasonal trends, while electric signals depend on the type of sensor used. There's also data that appear random, like stock data, though firms like Rentech manage to consistently find unlerlying alphas. Training a multivariate time series data would be challenging, but I don't see why not for specific applications.
Is Rentech the only group that genuinely manages to predict stock price? Seems like the very observation that it’s still possible would be enough motivation for other groups to catch up over such a long period.
Also, the first realistic approximation of Solomonoff induction we achieve is going to be interesting because it will destroy the stock market.
Agreed, if stock prices were predictable by some technical means, they would be quickly driven to unpredictability by people trading on those technical indicators.
This is that old finance chestnut. Two finance professors are walking down the hall and one of them spots a twenty dollar bill. He goes to pick it up but the other professor stops him and says "no don't bother. If there was twenty dollars there someone would have already picked it up"
Yes, people arbitrage away these anomalies, and make billions doing it.
Rentech does not seem to be able to predict the stock market for their customers...
"Jim Simons' Renaissance Technologies suffers $11 billion of client withdrawals in 7 months" - https://markets.businessinsider.com/news/stocks/jim-simons-r...
Maybe that would be a good thing. I wouldn't mourn the destruction of the stock market as it's just a giant wealth-gap increasing casino. Trading has nothing to do with underlying value.
There's a huge industry around time series forecasting used for all kinds of things like engineering, finance, climate science, etc. and many of the modern ones incorporate some kind of machine learning because they deal with very high dimensional data. Given the very surprising success of LLMs in non-language fields, it seems reasonable that people would work on this.
Task specific time series models, not time series “foundation models” - we are discussing different things.
I don't think we are. The premise of this is that the foundation model can learn some kind of baseline ability to reason about forecasting, that is generalizable across different domains (each which needs fine tuning.) I don't know if it will find anything, but LLMs totally surprised us, and this kind of thing seems totally worthy of investigation.
Foundational time series models have been around since 2019 and show competitive levels of performance with task specific models.
https://arxiv.org/abs/1905.10437
Fundamentally, the pre-trained model would need to learn a "world model" to predict well in distinct domains. This should be possible not regarding compute requirements and the exact architecture.
After all, the physical world (down to the subatomic level) is governed by physical laws. Ilya Sutskever from OpenAI stated that next-token prediction might be enough to learn a world model (see [1]). That would imply that a model learns a "world model" indirectly, which is even more unrealistic than learning the world model directly through pre-training on time-series data.
[1] https://www.youtube.com/watch?v=YEUclZdj_Sc
But the data generating process could be literally anything. We are not constrained by physics in any real sense if we predicting financial markets or occurrences of a certain build error or termite behavior.
Sure, there are limits. Not everything is predictable, not even physics. But that is also not the point of such a model. The goal is to forecast across a broad range of use cases that do have underlying laws. Similar to LLM, they could also be fine-tuned.
"predicting the next token well means that you understand the underlying reality that led to the creation of that token"
People on the AI-hype side of things tend to believe this, but I really fundamentally don't.
It's become a philosophical debate at this point (what does it mean to "understand" something, etc.)
There was a paper written a while back that proved mathematically how you can correlate any time series with any other time series, thus vaporizing any perception of value gained by correlating time series (at least for those people that read the paper.) just wanted to share
I would like to read more. Feels sort of like an expression of certain “universal truths” like the 80/20 rule or golden ratio
The only other timeseries paper I am aware of is TimeGPT
https://news.ycombinator.com/item?id=37874891
What does that mean "you can correlate"? That phrase is meaningless.
There is potential for integrating ML with time series data in industrial applications (things like smelters, reactors etc.), where you have continuous stream of time series measurements from things like gauges and thermocouples. If you can detect (and respond) to changing circumstances faster then a humans in control room reacting to trends or alarms then potential big efficiency gains...
Operator guidance is often based on heuristics - when metric A exceeds X value for Y seconds take action Z. Or rates of change if the signal is changing at a rate of more than x etc.
So in these areas there exists potential for ML solution, especially if it's capable of learning (i.e. last response overshot by X so trim next response appropriately).
Every time i've actually tried something like this it has not outperformed statistical process control.
It's not just that control charts are great signal detectors, but also managing processes like that takes a certain statistical literacy one gets from applying SPC faithfully for a while, and does not get from tossing ML onto it and crossing fingers.
There are clear counterexamples to your experience, most notably in maintaining plasma stability in tokamak reactors: https://www.nature.com/articles/s41586-021-04301-9
task specific model
The things that we are typically interested in have very clear patterns. In a way, if we find that there are no patterns, we don't even try to do any forecasting.
"The Unreasonable Effectiveness of Mathematics in the Natural Sciences" [1] hints that there might be some value here.
[1] https://en.m.wikipedia.org/wiki/The_Unreasonable_Effectivene...
Exactly, so for example, I think the use of this model is in cases where you want user count to have some pattern around timing. And be alerted if it has spike.
But you wouldn't want this model for file upload storage usage which only increases, where you would put alerts based on max values and not patterns/periodic values.
Why do you think language is so special?
There's an extensive body of literature across numerous domains that demonstrates the benefits of Multi-Task Learning (MTL). Actually I have a whole folder of research papers on this topic, here's one of the earliest references on hand that I feel captures the idea succinctly in the context of modern ML:
“MTL improves generalization by leveraging the domain-specific information contained in the training signals of related tasks" [Caruana, 1998]
I see repetition and structure everywhere in life. To me it's not far fetched that a model trained on daily or yearly trends could leverage that information in the context of e.g. biological signals which are influenced by circadian rhythm etc.
Disclaimer: my background is in ML & bio-signals, I work with time series too much.
For those who haven't read it, Rich Caruana's thesis on multi-task learning is beautifully written (the cited 1998 paper here). It's amazing to see how far the field has come, and, at the same time, how advanced the thinking was in the 90s too.
Watch this talk from Albert Gu:
Efficiently Modeling Long Sequences with Structured State Spaces
https://www.youtube.com/watch?v=luCBXCErkCs
They made one of the best time series models and it later became one of the best language models too (Mamba).
I have already watched that talk and know Albert Gu. His work is not about a “foundational” time series model but rather a task specific one.
+1 for “any underlying process”. It would be interesting what use case they had in mind.
Not really. It's true it would usually need more context than a single series dataset but you can predict broadly accurate-ish bandwidth usage trends just using simple statistical extrapolation, we've been doing that since the early 90s. If you give a model your subscriber numbers and usage data as time series it should be able to tell you quite accurately how much electricity|bandwidth|gas|road traffic levels| metro passenger levels at station Z... you'll be using at 4pm on January 4th 2026.
I think there are some generalizable notions of multiscale periodicity that could get embedded into some kind of latent space.
as you say, without knowing anything about the underlying process, we can't predict generally. Some other comments point to contexts in which we do know something about the underlying. For instance, I don't think finance is something where you can apply this kind of stuff.
well... if you look at a language in a certain way, it is just a way to put bits in a certain order. if you forget about the 'language' part, it kinda makes sense to try because why shouldn't it work?
Why not? There are plenty of time series that have underlying patterns which means you can do better than a total guess even without any knowledge of what you are predicting.
Think about something like traffic patterns. You probably won't predict higher traffic on game days, but predicting rush hour is going to be pretty trivial.