return to table of content

Fitting an elephant with four non-zero parameters

maitola
49 replies
1d

I love the ironic side of the article. Perhaps they should add the reason for it, from Fermi's and Neumann's. When you are building a model of reality in Physics, If something doesn’t fit the experiments, you can’t just add a parameter (or more) variate it and fit the data. The model should have zero parameters, ideally, or the least possible, or, even at a more deeper level, the parameters should emerge naturally from some simple assumptions. With 4 parameters you don’t know whether you are really capturing a true aspect of reality of just fitting the data of some experiment.

jampekka
37 replies
23h59m

This was mentioned in the first paragraph of the paper. The paper is mostly humoristic.

That said, the wisdom of the quip has been widely lost in many fields. In many fields data is "modeled" with huge regression models with dozens of parameters or even neural networks with billions of parameters.

In 1953, Enrico Fermi criticized Dyson’s model by quoting Johnny von Neumann: “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”[1]. This quote is intended to tell Dyson that while his model may appear complex and precise, merely increasing the number of parameters to fit the data does not necessarily imply that the model has real physical significance.
karmakaze
35 replies
21h32m

That's how I feel about dark matter. Oh this galaxy is slower than this other similar one. The first one must have less dark matter then.

What can't be fit by declaring the amount of dark matter that must be present fits the data? It's unfalsifiable, just because we haven't found it, doesn't mean it doesn't exist. Even worse than string/M-theory which at least has math.

GuB-42
15 replies
18h23m

The dark matter theory is falsifiable. Sure we can't see dark matter (it doesn't interact electromagnetically), but we can see its effects, and it has to follow the laws of physics as we understand them today.

It is actually a satisfying theory with regard to the Occam razor. We don't have to change our laws of physics to explain the abnormal rotations of galaxy, we just need "stuff" that we can't see and yet interact gravitationally. When we have stuff like neutrinos, it is not that far fetched. In fact, though unlikely given our current understanding of physics, dark matter could be neutrinos.

If, as it turn out, the invisible stuff we call dark matter doesn't follow the laws of physics as we know them, then the dark matter theory is falsified and we need a new one (or at least some tweaks). And it may actually be the case as a recent paper claims that gravitational lensing doesn't match the predictions of the dark matter theory.

The main competitor to dark matter is modified gravity, which calls for no new stuff, but changes the equations for gravity. For the Occam razor, adding some random term to an equation is not really better than adding some invisible but well characterized stuff, especially when we consider that the equation in question is extremely well tested. It is, of course, also falsifiable.

The problem right now is not that these theories are unfalsifiable, it is that they are already pretty much falsified in their current form (dark matter less than modified gravity), and some rework is needed.

nyssos
5 replies
16h31m

Sure we can't see dark matter (it doesn't interact electromagnetically), but we can see its effects

Even this is granting too much: "seeing it" and "seeing its effects" are the same thing. No one has ever "directly seen", in the sense that internet DM skepticism demands, anything other than a photon.

adrian_b
4 replies
13h11m

"Seeing" is indeed a poorly chosen word.

The problem with dark matter is that there does not exist any second relationship from which to verify its existence, like in the case of normal matter, which takes part in a variety of interactions that lead to measurable effects, which can be compared.

The amount and the location of dark matter is computed from the gravitational forces that explain the observed movements of the bodies, but there are no additional relationships with any other data, which could corroborate the computed distribution of dark matter. That is what some people mean by "seeing".

nyssos
1 replies
12h56m

All major DM candidates also have multiple interactions: that's the WI in WIMP, for instance. In fact I don't know that anyone is seriously proposing that dark matter is just bare mass with no other properties - aside from the practical problems, that would be a pretty radical departure from the last century of particle physics.

jampekka
0 replies
9h51m

No interactions have been found, despite a lot of resources put into the search. So currently all dark matter particle theories apart from "non-interacting" have been falsified. And non-interacting theories are probably unfalsifiable.

Radical departure may well be needed, for other reasons too.

karmakaze
1 replies
4h5m

The problem with dark matter is that there does not exist any second relationship from which to verify its existence.

This is exactly it! Dark matter is strictly defined by its effects. The only 'theory' part is a belief that it's caused by yet to be found particle that's distributed to fit observations. Take all the gravitational anomalies that we can't explain with ordinary matter, then arbitrarily distribute an imaginary 'particle' that solves them: that's DM.

The problem is that the language used to talk about DM is wrong. It's not that DM doesn't interact with EM, or the presence of DM is causing the galaxies to rotate faster than by observed mass. These are all putting the cart before the horse. What we have is unexplained gravitational effects being attributed to a hypothetical particle. If we discovered a new unexplained gravitational property, we would merely add that to the list of DM's attributes rather than say "oh then it can't be DM".

nyssos
0 replies
2h40m

Dark matter is strictly defined by its effects

All physical entities are defined by their effects! Suppose we found axions and they had the right mass to be dark matter. Would that mean we now "really knew" what dark matter was, in your sense? No, it would just push the defining effects further back - because all an axion is is a quantum of the (strong CP-violation term promoted to a field).

Just like the electromagnetic field is the one that acts on charged particles in such and such a way, and a particle is charged if the electromagnetic field acts on it in that way. There's no deeper essence, no intuitive "substance" with some sort of intrinsic nature. All physical properties are relational.

adrian_b
3 replies
13h25m

If we add an arbitrary amount of dark matter everywhere, to match the observed motions of the celestial bodies, that adds an infinity of parameters, and not even a enumerable one.

This obviously can match almost anything and it has extremely low predictive power (many future observations may differ from predictions, which can be accounted by some dark matter whose distribution was previously unknown), so it is a much worse explanation than a modified theory of gravity that would have only a finite number of additional parameters.

adgjlsfhk1
1 replies
11h59m

the reason this isn't true is that by the hypothesis of dark matter, it follows gravity but not electromagnetism. as such it only fits distributions recoverable from evolving gravity. e.g. if we require a certain distribution today, it fixes the distribution at all other points in time, and we can use light speed delay to look into the past to verify whether the distributions have evolved according to gravity.

Retric
0 replies
5h39m

All observations of individual galaxies occur at a specific point in time. We can’t use light speed delay to see the evolution of individual galaxies only completely different galaxies at some other point in time. As such each galaxy gets its own value for the amount of dark matter.

At minimum this is a ~200 billion parameter model, and more if you’re looking at smaller structures.

smallnamespace
0 replies
13h11m

to match the observed motions of the celestial bodies

The point is that even with current observational data there's no reasonable distribution of dark matter that correctly explains all evidence that we have.

Your intuition that "if I have an infinite number of degrees of freedom anything at all can be fit" is leading you astray here.

andrewflnr
2 replies
14h17m

For the Occam razor, adding some random term to an equation is not really better than adding some invisible but well characterized stuff...

You're being too kind. It's worse. Especially when (in my understanding anyway) that added term doesn't even explain all the things dark matter does.

adrian_b
1 replies
13h20m

Adding any finite number of parameters is strictly better than adding an infinity of parameters (i.e. an arbitrary distribution of dark matter chosen to match the observations).

Dylan16807
0 replies
12h56m

The distribution has to be consistent forward and backwards in time. It's a lot less arbitrary than you're implying, and adding a hundred parameters (or similar finite number) to gravity is not better.

beeforpork
1 replies
11h5m

'dark matter' is not a theory, it is the name of an observational problem.

There are many theories to explain dark matter observations. MOND is not a competitor with 'dark matter', because MOND is a theory and it tries to explain some aspects (spiral galaxy rotation) of what is observed as the dark matter problem, which consists of many more observations. There is no competition here. There are other theories to explain dark matter, like dark matter particle theories involving neutrinos or whatever, and these may be called competitors, but dark matter itself is not a theory, but a problem statement.

XorNot
0 replies
9h38m

Yes and no...MOND's core proposition is that dark matter doesn't exist, and instead modified gravity does.

Whereas you can have many proposals for what dark matter is, provided it is capable of being almost entirely only gravitationally interacting, and there's enough of it.

MOND has had the problem that depending which MOND you're talking about, it still doesn't explain all the dark matter (so now you're pulling free parameters on top of free parameters).

rocqua
5 replies
18h54m

I used to think this, but dark matter does make useful predictions, that are hard to explain otherwise.

This is partially because there are two ways to detect dark-matter. The first is gravitational lensing. The second is the rotatinal speed of galaxies. There are some galaxies that need less Dark Matter to explain their rotational speed. We can then cross check whether those galaxies cause less gravitational lensing.

Besides that, the gravitational lensing of galaxies being stronger than the bright matter in the galaxies can justify is hard to explain without dark matter.

jampekka
4 replies
18h48m

The problem with dark matter is that there's no (working) theory on how the dark matter is distributed. It's really easy to "explain" gravitational effects if you can postulate extra mass ad-hoc to fit the observations.

rocqua
1 replies
18h34m

If there are two different types of observations, and one parameter can explain both, that is pretty strong evidence. Put differently, dark matter is falsifyable, and experiments have tried to falsify it without success.

Besides the idea 'not all mass can be seen optically' is not that surprising. The many theories on what that mass might be are all speculation, but they are treated as such.

XorNot
0 replies
9h29m

It's worth noting that one dark matter explanation is just: it's cold matter we just can't see through telescopes. Or black holes without accretion disks.

Both of these are pretty much ruled out though: you can't plausibly add enough brown dwarfs, and if it's black holes then you should see more lensing events towards nearby stars given how many you'd need.

But they're both concrete predictions which are falsifiable (or boundable such that they can't be the dominant contributors).

kaibee
1 replies
15h13m

I dunno if this is the correct way of thinking about it, but I just imagine it as a particle that has mass but does not interact with other particles (except at big-bang like energy levels?). So essentially a galaxy would be full of these particles zipping around never colliding with anything. And over time, some/most of these particles would have stable orbits (as the ones in unstable orbits would have flown off by now) around the galactic core. And to an observer, it would look like a gravitational tractor ahead of the rest of the physical mass of the galaxy (which is slower because it is affected by things like friction and collisions?). And so you'd see galaxies where the arms are spinning faster than they should be?

nyssos
0 replies
12h54m

I dunno if this is the correct way of thinking about it, but I just imagine it as a particle that has mass but does not interact with other particles (except at big-bang like energy levels?).

Not even anything that extreme. What's ruled out is interaction via electromagnetism (or if you want to get really nit-picky, electromagnetic interaction with a strength above some extremely low threshold).

edflsafoiewq
5 replies
20h17m

It's easy to say "Epicycles! Epicycles!", but people are going to continue using their epicycles until a Copernicus comes along.

jampekka
2 replies
18h40m

There will be no Copernicus if everybody just studies epicycles. E.g. there are massive resources put into the desperate WIMP hunt that could be used for finding new theories.

Dylan16807
1 replies
12h53m

I don't see how those resources are fungible with each other.

jampekka
0 replies
6h41m

Research funding is very competitive and scarce.

fiddlerwoaroof
1 replies
12h55m

Well, the funny thing is Copernicus posits just about as many epicycles in his theory as previous geocentric theories. Only Kepler’s discovery of the equal area law and elliptical orbits successfully banishes epicycles.

aeneasmackenzie
0 replies
1h48m

The history of these discoveries is fascinating and shows that Kuhn’s scientific revolutions idea is wrong but it’s always rounded off to “Copernicus and Galileo” and doesn’t even get them right

nyssos
3 replies
17h27m

What can't be fit by declaring the amount of dark matter that must be present fits the data?

Tons of things - just like there are tons of things that can't be fit by declaring the amount of electromagnetically-interacting matter that must be present fits the data.

You can fit anything you like by positing new and more complicated laws of physics, but that's not what's going on here. Dark matter is ordinary mass gravitating in an ordinary way: the observed gravitational lensing needs to match up with the rotation curves needs to match up with the velocity distributions of galaxies in clusters; you don't strictly need large scale homogeneity and isotropy but you really really want it, etc. Lambda-CDM doesn't handle everything perfectly (which in itself demonstrates that it's not mindless overfitting) but neither does anything else.

XorNot
1 replies
9h33m

You also have to do other things like not break General Relativity.

Which MOND does: it creates huge problems fitting into GR.

Whereas dark matter as just regular mass that interacts poorly by other means does not.

jampekka
0 replies
6h42m

There are modified gravity theories that are compatible/extensions to GR, e.g the f(R) gravity theories.

Nobody probably believes MOND as such is some fundamental theory, rather as a "theory" it's sort of a stepping stone. Also MOND is used often interchangeably (and confusingly) with modified gravity theories in general.

karmakaze
0 replies
1h43m

Dark matter is ordinary mass gravitating in an ordinary way: the observed gravitational lensing needs to match up with the rotation curves needs to match up with the velocity distributions of galaxies in clusters

Those are all the same thing, the shape of spacetime. The only thing DM adds is a backstory that this shaping comes from hypothetical undiscovered particles with properties that match observations.

andrewflnr
2 replies
14h20m

Dark matter is constrained by, among other things, dynamical simulations. For instance, here's an example of reproducing real world observations, that previously didn't have great explanations, using simulations with dark matter: https://www.youtube.com/live/8rok8E_tz8k?si=Q7vmQYpZr_6K7--m. And that's not even getting into the cosmology that has to (and mostly does) fit together.

karmakaze
1 replies
3h53m

Interesting that you should link that video. Its title card says "Angela Collier". Here's a more recent video by the physicist[0].

Re: where it says "using simulations with dark matter", we can't simulate DM because it doesn't have any properties beyond our observations. All we do is distribute amounts of it to match observations. It could be "Dyson spheres with EM shields" and the results would be the same.

[0] https://www.youtube.com/watch?v=PbmJkMhmrVI

andrewflnr
0 replies
1h40m

Yes, and I think that video is stupid. She doesn't use the term that way in her own talk, and neither does any scientist I've ever heard. I think she's trying to make some abstract point about science in general and muddying the water in the process. Her takes on terminology are often bad IMO.

That doesn't take away the fact that when you work with the slightly more specific theory of "particle dark matter" it produces real results. And I believe there's a lot more work over the years in similar areas. It doesn't get talked about because it's not sexy, so people who only follow cosmology when there's drama don't hear about it. That was just the example at the top of my mind because I'd seen it recently, and the result is really quite spectacular. Did you watch it through?

Calavar
0 replies
18h4m

> In 1953, Enrico Fermi criticized Dyson’s model by quoting Johnny von Neumann: “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”

For those who are interested, you can watch Freeman Dyson recount this conversation in his own words in an interview: https://youtu.be/hV41QEKiMlM

edflsafoiewq
5 replies
20h39m

Isn't the form of an equation really just another sort of parameter?

tobias2014
1 replies
19h39m

This is why I think that modeling elementary physics is nothing else than fitting data. We might end up with something that we perceive as "simple", or not. But in any case all the fitting has been hidden in the process of ruling out models. It's just that a lot of the fitting process is (implicitly) being done by theorists; we come up with new models and that are then being falsified.

For example, how many parameters does the Standard Model have? It's not clear what you count as a parameter. Do you count the group structure, the other mathematical structure that has been "fitted" through decades of comparisons with experiments?

slashdave
0 replies
2h47m

You are using the word "fitting" rather loosely. We usually "fit" models of fixed function form and fixed number of parameters.

You are also glossing over centuries of precedent that predate high-energy physics, namely quantum field theory, special relativity, and foundational principles such as conservation of energy and momentum.

rocqua
0 replies
18h51m

It tends to be a parameter that can be derived from rrasoning and assumptions. This contrasts to free parameters where you say "and we have no idea what this value should be, so we'll measure it"

qarl
0 replies
20h30m

Yes, it is.

Which makes the only truly zero parameter system the collection of all systems, in all forms.

dilawar
1 replies
23h28m

Hmm..

Hodgin and Huxley did ground-breaking work on squid's giant axon and modelled neural activity. They had multiple parameters extracted from 'curve fitting' of recorded potential and injected currents which were much later mapped to sodium channels. Similarly, another process to potassium channels.

I woudnt worry too much having multiple parameters -- even four when 3 can't just explain the model.

nyssos
0 replies
22h33m

Neuron anatomy is the product of hundreds of millions of years of brute contingency. There are reasons why it can't be certain ways (organisms that were that way [would have] died or failed to reproduce) but no reason whatsoever why it had to be exactly this way. It didn't, there are plenty of other ways that nerves could have worked, this is just the way they actually do.

The physics equivalent is something like eternal inflation as an explanation for apparent fine-tuning - except that even if it's correct it's still absolutely nowhere near as complex or as contingent as biology.

ahazred8ta
1 replies
19h8m

Notably done for the first time irl in "Least square fitting of an elephant", James Wei (1975) Chemtech

will1am
0 replies
22h30m

The balance between empirical data fitting and genuine understanding of the underlying reality

elijahbenizzy
31 replies
1d

This is humorous (and well-written), but I think its more than that.

I'm always making the joke (observation) that ML (AI) is just curve-fitting. Whether "just curve-fitting" is enough to produce something "intelligent" is, IMO, currently unanswered, largely due to differing viewpoints on the meaning of "intelligent".

In this case they're demonstrating some very clean, easy-to-understand curve-fitting, but it's really the same process -- come up with a target, optimize over a loss function, and hope that it generalizes, (this one, obviously, does not. But the elephant is cute.)

This raises the question Neumann was asking -- why have so many parameters? Ironically (or maybe just interestingly), we've done a lot with a ton of parameters recently, answering it with "well, with a lot of parameters you can do cool things".

visarga
19 replies
1d

Whether "just curve fitting" is enough to produce something "intelligent" is, IMO, currently unanswered

Continual "curve fitting" to the real world can create intelligence. What is missing is not something inside the model. It's missing a mechanism to explore, search and expand its experience.

Our current crop of LLMs ride on human experience, they have not largely participated in creating their own experiences. That's why people call it imitation learning or parroting. But once models become more agentic they can start creating useful experiences on their own. AlphaZero did it.

soist
14 replies
1d

AlphaZero did not create any experiences. AlphaZero was software written by people to play board games and that's all it ever did.

visarga
8 replies
22h5m

AZ trained in self-play mode for millions of games, over multiple generations of a player pool.

soist
7 replies
21h42m

I am familiar with the literature on reinforcement learning.

pharrington
6 replies
20h48m

They're saying the board games AlphaZero played with itself are experiences.

soist
5 replies
18h22m

And I am saying they are confused because they are attributing personal characteristics to computers and software. By spelling out what computers are doing it becomes very obvious that there is nothing that can be aware of any experiences in computers as it is all simply a sequence of arithmetic operations. If you can explain which sequence of arithmetic operations corresponds to "experiences" in computers then you might be less confused than all the people who keep claiming computers can think and feel.

nyssos
4 replies
16h26m

By spelling out what computers are doing it becomes very obvious that there is nothing that can be aware of any experiences in computers as it is all simply a sequence of arithmetic operations.

By spelling out what brains are doing it becomes very obvious that it's all simply a sequence of chemical reactions - and yet here we are, having experiences. Software will never have a human experience - but neither will a chimp, or an octopus, or a Zeta-Reticulan.

Mammalian neurons are not the only possible substrate for intelligence; if they're the only possible substrate for consciousness, then the fact that we're conscious is an inexplicable miracle.

soist
1 replies
16h19m

This is a common retort. You can read my other comments if you want to understand why you're not really addressing my points because I have already addressed how reductionism does not apply to living organisms but it does apply to computers.

Dylan16807
0 replies
12h37m

The comments where you demand an instruction set for the brain, or else you'll dismiss any argument saying its actions can be computed? Even after people explained that lots of computers don't even have instruction sets?

And where you decide to assume that non-computable physics happens in the brain based on no evidence?

What a waste of time. You "addressed" it in a completely meaningless way.

godelski
1 replies
15h8m

If an algorithmic process is an experience and a collection of experiences is intelligence then we get some pretty wild conclusions that I don't think most people would be attempting to claim as it'd make them sound like a lunatic (or a hippy).

Consider the (algorithmic) mechanical process of screwing in a screw into a board. This screw has an "experience" and therefore intelligence. So... The screw is intelligent? Very low intelligence, but intelligent according to this definition.

But we have an even bigger problem. There's the metaset of experiences, that's the collection of several screws (or the screw, board, and screwdriver together). So we now have a meta intelligence! And we have several because there's the different operations on these sets to perform.

You might be okay with this or maybe you're saying it needs memory. If the later you hopefully quickly realize this means a classic computer is intelligent but due to the many ways information can be stored it does not solve our above conundrum.

So we must then come to the conclusion that all things AND any set of things have intelligence. Which kinda makes the whole discussion meaningless. Or, we must need a more refined definition of intelligence which more closely reflects what people actually are trying to convey when they use this word.

nyssos
0 replies
13h1m

If an algorithmic process is an experience and a collection of experiences is intelligence

Neither, what I'm saying is that the observable correlates of experience are the observable correlates of intelligence - saying that "humans are X therefore humans are Y, software is X but software is not Y" is special pleading. The most defensible positions here are illusionism about consciousness altogether (humans aren't Y) or a sort of soft panpsychism (X really does imply Y). Personally I favor the latter. Some sort of threshold model where the lights turn on at a certain point seems pretty sketchy to me, but I guess isn't ruled out. But GP, as I understand them, is claiming that biology doesn't even supervene on physics, which is a wild claim.

Or, we must need a more refined definition of intelligence which more closely reflects what people actually are trying to convey when they use this word.

Well that's the thing, I don't think people are trying to convey any particular thing. I think they're trying to find some line - any line - which allows them to write off non-animal complex systems as philsophically uninteresting. Same deal as people a hundred years ago trying to find a way to strictly separate humans from nonhuman animals.

rocqua
4 replies
18h47m

Are you launching into a semantic argument about the word 'experience'? If so, it might help to state what essential properties alphago was missing that makes it 'not having an experience'.

Otherwise this can quickly devolve into the common useless semantic discussion.

soist
3 replies
18h22m

Just making sure no one is confused by common computationalist sophistry and how they attribute personal characteristics to computers and software. People can have and can create experiences, computers can only execute their programmed instructions.

HeatrayEnjoyer
2 replies
16h39m

On what priors are you making that statement?

soist
1 replies
16h27m

Rephrase your question. I don't know what you're asking.

nertirs
0 replies
11h2m

I think he meant to ask, what is the difference between an experience and a predefined instruction?

kgeist
1 replies
12h29m

It's missing a mechanism to explore, search and expand its experience.

Can't we create an agent system which can search the internet and choose what data to train itself with?

Xcode23
0 replies
11h12m

you need to define what the utility function of the agent is so it can know what to actually use to train itself. If we knew that this whole debate about human intelligence in computers would either be solved already or well on its way to being solved.

godelski
0 replies
15h37m

Continual "curve fitting" to the real world can create intelligence.

I'm going to need a citation on this bold claim. And by that I mean in the same vein as what Carl Sagan would say

  Extraordinary claims require extraordinary evidence

elijahbenizzy
0 replies
22h42m

There are a whole bunch of assumptions here. But sure, if you view the world as a closed system, then you have a decision as a function of inputs:

1. The world around you 2. The experiences within your (really, the past view of the world around you) 3. Innateness of you (sure, this could be 2 but I think it's also something else) 4. The experience you find + the way you change yourself to impact (1), (2), and (3)

If you think of intelligence as all of these, then you're making the assumption that all that's required for (2), (3), and (4) is "agentic systems", which I think skips a few steps (as the author of an agent framework myself...). All this is to say that "what makes intelligence" is largely unsolved, and nobody really knows, because we actually don't understand this ourselves.

luplex
6 replies
1d

I mean the devil is in the details. In Reinforcement Learning, the target moves! In deep learning, you often do things like early stopping to prevent too much optimization.

soist
5 replies
1d

There is no such thing as too much optimization. Early stopping is to prevent overfitting to the training set. It's a trick just like most advances in deep learning because the underlying mathematics is fundamentally not suited for creating intelligent agents.

rocqua
4 replies
18h44m

Is over fitting different from 'too much optimization'? Optimization still needs a value that is optimized. Over fitting is the result of too much optimization for not quite the right value (i.e. training error when you want to reduce prediction error)

soist
3 replies
18h21m

What value is being optimized and how do you know it is too much or not enough?

godelski
2 replies
14h28m

I think the miscommunication is due to the proxy nature of our modeling. From one perspective, yes you're right because it's just on your optimization function and objectives. But if we're in the context where we recognize the practical usage of our model replies on it being an inexact representation (proxy) then certainly there is too much optimization. I mean most of what we try to model in ML is intractable.

In fact, that entire notion of early stopping is due to this. We use a validation set as a pseudo test set to inject information into our optimization products without leaking information from the test set (why you shouldn't choose parameters based on test results. That is spoilage. Doesn't matter if it's status quo, it's spoilage)

But we also need to consider that a lack of divergence between train/val does not mean there isn't overfittng. Divergence implies overfittng but the inverse statement is not true. I state this because it's both relevant here and an extremely common mistake.

soist
1 replies
14h17m

Most practitioners seem to understand that what they are doing is creating executable models and they don't confuse the model based on numeric observations with the actual reality. This is why I very much do not like all the AI hype and how statistical models were rebranded as artificial "intelligence" because the people who are not aware of what the words mean get very confused and start thinking they are nothing more than computers executing algorithms to fit numerical data to some unspecified cognitive model.

godelski
0 replies
13h32m

Most practitioners seem to understand that what they are doing is creating executable models and they don't confuse the model based on numeric observations with the actual reality.

I think you're being too optimistic, and I'm a pretty optimistic person. Maybe it is because I work in ML, but I've had to explain to a large number of people this concept. This doesn't matter if it is academia or industry. It is true for both management and coworkers. As far as I can tell, people seem very happy to operate under the assumption that benchmark results are strong indicators of real world performance __without__ the need to consider assumptions of your metrics or data. I've even proven this to a team at a trillion dollar company where I showed a model with lower test set performance had more than double the performance on actual customer data. Response was "cool, but we're training a much larger model on more data, so we're going to use that because it is a bit better than yours." My point was that the problem still exists in that bigger model with more data, but that increased params and data do a better job at hiding the underlying (and solvable!) issues.

In other words, in my experience people are happy to be Freeman Dyson in the conversation Calavar linked[0] and very upset to hear Fermi's critique: being able to fit data doesn't mean shit without either a clear model or a rigorous mathematical basis. Much of data science is happy to just curve fit. But why shouldn't they? You advance your career in the same way, by bureaucrats who understand the context of metrics even less.

I've just experienced too many people who cannot distinguish empirical results from causal models. And a lot of people who passionately insist there is no difference.

[0] https://news.ycombinator.com/item?id=40964328

maitola
2 replies
23h59m

In the case of AI, the more parameters, the better! In Physics is the opposite.

will1am
0 replies
22h26m

A dichotomy between these fields

elijahbenizzy
0 replies
22h31m

One of the hardest parts of training models is avoiding overfitting, so "more parameters are better" should be more like "more parameters are better given you're using those parameters in the right way, which can get hard and complicated".

Also LLMs just straight up do overfit, which makes them function as a database, but a really bad one. So while more parameters might just be better, that feels like a cop-out to the real problem. TBD what scaling issues we hit in the future.

will1am
0 replies
22h27m

Your humorous observation captures a fundamental truth to some extent

redox99
1 replies
19h9m

That's like saying your entire hard drive is a single number.

zellyn
0 replies
2h58m

From the paper:

Paintadosi [4] argues that one parameter is always enough. He constructed a function that, through a single parameter, can depict any shape. However, in essence, this work is a form of encoding, mapping the shape into a real number with precision extending to hundreds or even thousands of decimal places. For our problem, this is meaningless, although the paper’s theme is that “parameter counting” fails as a measure of model complexity
dweinus
0 replies
22h41m

"This single parameter model provides a large improvement over the prior state of the art in fitting an elephant"

Lol

danbruc
0 replies
7h31m

The number of parameters is just the wrong metric, it should be the amount of information contained in the parameter values, their entropy, Kolmogorov complexity or something along that line.

Nition
0 replies
22h15m

Nice. This is like how you can achieve unlimited compression by storing your data in a filename instead of in the file.

dheera
6 replies
1d2h

I wish there was more humor on arXiv.

If I could make a discovery in my own time without using company resources I would absolutely publish it in the most humorous way possible.

mananaysiempre
1 replies
23h46m

Joke titles and/or author lists are also quite popular, e.g. the Greenberg, Greenberger, Greenbergest paper[1], a paper with a cat coauthor whose title I can’t seem to recall (but I’m sure there’s more than one I’ve encountered), or even the venerable, unfortunate in its joke but foundational in its substance Alpher, Bethe, Gamow paper[2]. Somewhat closer to home, I think computer scientist Conor McBride[3] is the champion of paper titles (entries include “Elimination with a motive”, “The gentle art of levitation”, “I am not a number: I am a free variable”, “Clowns to the left of me, jokers to the right”, and “Doo bee doo bee doo”) and sometimes code in papers:

  letmeB this (F you) | you == me = B this
                      | otherwise = F you
  letmeB this (B that)            = B that
  letmeB this (App fun arg)       = letmeB this fun `App` letmeB this arg
(Yes, this is working code; yes, it’s crystal clear in the context of the paper.)

[1] https://arxiv.org/abs/hep-ph/9306225

[2] https://en.wikipedia.org/wiki/Alpher%E2%80%93Bethe%E2%80%93G...

[3] http://strictlypositive.org/

gjm11
0 replies
19h31m

paper with a cat coauthor whose title I can't seem to recall

You probably have in mind https://en.wikipedia.org/wiki/F._D._C._Willard (coauthor of multiple papers, sole author of at least one).

azeemba
0 replies
1d1h

Consider posting this as a new post! It seems like a fun list to read through

paulpauper
0 replies
15h48m

There is . it is called the General Math section. what is more funny than a 2 page proof of the Reiman Hypothesis?

aqme28
3 replies
22h27m

It only satisfies a weaker condition, i.e., using four non-zero parameters instead of four parameters.

Why would that be a harder problem? In the case that you get a zero parameter, you could inflate it by some epsilon and the solution would basically be the same.

Sesse__
1 replies
20h55m

They also, effectively, fit information in the indexes of the parameters. I.e., _which_ of the parameters are nonzero carries real information.

In a sense, they have done their fitting using nine parameters, of which five are zero.

aqme28
0 replies
18h48m

I didn’t read enough to catch that. How the heck did they justify that?

nyssos
0 replies
16h15m

In the case that you get a zero parameter, you could inflate it by some epsilon and the solution would basically be the same.

Not everything is continuous. Add an epsilon worth of torsion to GR and you don't get almost-GR, you get a qualitatively different theory in which potentially arbitrarily large violations of the equivalence principle are possible.

xpe
2 replies
23h26m

One take away: Don’t count parameters. Count bits.

Scene_Cast2
1 replies
23h21m

Better yet, count entropy.

xpe
0 replies
21h16m

Why “better”? Entropy in the information theoretic sense is usually quantified in bits.

Steuard
2 replies
1d

Sadly, the constant term (the average r_0) is never specified in the paper (it seems to be something in the neighborhood of 180?): getting that right is necessary to produce the image, and I can't see any way not to consider it a fifth necessary parameter. So I don't think they've genuinely accomplished their goal.

(Seriously, though, this was a lot of fun!)

rsfern
1 replies
1d

They say in the text that it’s the average value of the data points they fit to. I think whether to count it as a parameter depends on whether you consider standardization to be part of the model or not

Steuard
0 replies
21h45m

I see your point, that it's really just an overall normalization for the size rather than anything to do with the shape. I can accept that, and I'll grant them the "four non-zero parameters" claim.

Though in that case, I would have liked for them to make it explicit. Maybe normalize it to "1", and scale the other parameters appropriately. (Because as it stands, I don't think you can reproduce their figure from their paper.)

parker-3461
1 replies
1d2h

Thanks for linking these, I was not very familiar of these works/discussions taking place in the past, but these really helped establish the context. Very grateful that these videos are readily available.

EdwardCoffin
0 replies
1d1h

I listened to the whole series with Dyson some time in the past year. It was well worth it. I also listened to the series with Murray Gell-Mann [1] and Hans Bethe [2]. All time well worth spending, and I've been thinking of downloading all the bits, concatenating them into audio files, and putting them on my phone for listening to when out on walks (I'm pretty sure the videos do not add anything essential: it's just a video of the interviewee talking - no visual aids).

[1] https://www.youtube.com/playlist?list=PLVV0r6CmEsFxKFx-0lsQD...

[2] https://www.youtube.com/watch?v=LvgLyzTEmJk&list=PLVV0r6CmEs...

xpe
1 replies
23h23m

Another take away (not directly stated in the article but implied): Counting the information content of a model is more than just the parameters; the structure of the model itself conveys information.

will1am
0 replies
22h25m

I think often underappreciated insight

boywitharupee
1 replies
1d1h

what's the purpose of this? is it one of those 'fun' problems to solve?

pietroppeter
0 replies
5h41m

Love how they misspelled Piantadosi as Paintadosi :)

lupire
0 replies
1d

IIUC:

A real-parameter (r(theta) = sum(r_k cos(k theta))) Fourier series can only draw a "wiggly circle" figure with one point on each radial ray from the origin.

A compex parameter (z(theta) = sum(e^(z_ theta))) can draw more squiggly figures (epicycles) -- the pen can backtrack as the drawing arm rotates, as each parameter can move a point somewhere on a small circle around the point computed from the previous parameter (and recursively).

Obligatory 3B1B https://m.youtube.com/watch?v=r6sGWTCMz2k

Since a complex parameter is 2 real parameters, we should compare the best 4-cosine curve to the best 2-complex-exponential curve.

lazamar
0 replies
1d2h

Lol. Loved it.

This was a lovely passage from Dyson’s Web of Stories interview, and it struck a chord with me, like it clearly did with the authors too.

It happened when Dyson took the preliminary results of his work on the Pseudoscalar theory of Pions to Fermi and Fermi very quickly dismissed the whole thing. It was a shock to Dyson but freed him from wasting more time on it.

Fermi: When one does a theoretical calculation, either you have a clear physical module in mind or a rigorous mathematical basis. You have neither. How many free parameters did you use for your fitting?

Dyson: 4

Fermi: You know, Johnny Von Neumann always used to say ‘with four parameters I can fit an elephant; and with five I can make him wiggle his trunk’.

classified
0 replies
6h42m

What is that horizontal bar above r0 in the last equation?

bee_rider
0 replies
23h42m

Ya know, in academic writing I tend to struggle with making it sound nice and formal. I try not to use the super-stilted academic style, but it is still always a struggle to walk the line between too loose and too jargony.

Maybe this sort of thing would be a really good tradition. Everyone must write a very silly article with some mathematical arguments in it. Then, we can all go forward with the comfort of knowing that we aren’t really at risk of breaking new grounds in appearing unserious.

It is well written and very understandable!