There exists the Universal (Function) Approximation Theorem for neural networks — which states that they can represent/encode any function to a desired level of accuracy[0].
However there does not exist a theorem stating that those approximations can be learned (or how).
[0] https://en.m.wikipedia.org/wiki/Universal_approximation_theo...
FYI, there are actually many algorithms going back longer than the neural network algorithm that have been proven to be a universal function approximator. Neural networks are certainly not the only and not the first to do so. There are quite a few that are actually much more appropriate for many cases than a neural network.
What other algorithms can do this and which situations would they be more useful than neural networks?
Newtons Method approximates square roots. Its useful if you want to approximate something like that without pulling in the computational power required of NN.
Newton's method related to universal function approximation in the same way a natural oil seep is related to a modern IC engine...
By definition, that’s not a “universal“ function approximator.
I think the problem to solve is more like : given a set of inputs and outputs, find a function that gives the expected output for each input [1]. This is like Newton's method on a higher order ;-). One can find such a tool in Squeak or Pharo Smalltalk, IIRC.
[1] https://stackoverflow.com/questions/1539286/create-a-functio...
The Taylor Series dates to 1715. Fourier Series dates to the 1820s.
Both are universal function approximators and both can be learned via gradient descent.
For the case where the function you want to learn actually is polynomial or periodic (respectively), these are better than neural networks.
For your interest, Taylor Series are not universal function approximators - the Taylor Series around 0 for
f(x) = e^(-1/x^2) if x != 0 else 0
is identically zero (all partial derivatives are 0 at 0) but the function is clearly not identically zero. So the radius of convergence for this Taylor series is infinite but it only equals the approximated function at one point.
I'm sure there are some conditions you can put on f to make the Taylor Series a UFA but it's been quite a while since I did any real analysis so I have forgotten!
Doesn't detract from the overall point though that there are UFAs that are not neural nets. I should say that I don't know what the precise definition of a UFA really is, but I assume you have to have more than equality at one point.
I'm pretty sure the UFA theorems for neural networks wouldn't apply to that function either: https://en.wikipedia.org/wiki/Universal_approximation_theore...
Generally, they assume the function to be approximated is continuous.
Taylor series work on differentiable intervals. You specifically chose a function and interval where this is not true. Of course it will not be a good approximation.
https://www.quora.com/Is-there-any-universal-function-approx...
This area is covered by non-parametric statistics more generally. There are many other methods to non-parametrically estimate functions (that satisfy some regularity conditions). Tree-based methods are one family of such methods, and the consensus still seems to be that they perform better than neural networks on tabular data. For example:
https://arxiv.org/abs/2106.03253
Makes you wonder what is meant by learning...
Learning is using observations to create/update a model that makes predictions which are more accurate than chance. At some point the model ends up having generalizability beyond the domain.
They can model only continuous functions, more specifically any continuous function on compact subsets of ℝⁿ. They can approximate functions to an arbitrary level of accuracy, given sufficient neurons
Not any function though. There are restrictions on type of functions "universal" approximation theorem is applicable for. Interestingly, the theorem is about a single layer network. In practice, that does not work as well as having many layers.
People throw that proof around all the time; but all it does is show that a neural net is equivalent to a lookup table; and a lookup table with enough memory can approximate any function. It's miles away from explaining how real world, useful, neural nets, like conv-nets, transformers, LSTMs, etc. actually work.