return to table of content

What Is Entropy?

glial
45 replies
22h15m

I felt like I finally understood Shannon entropy when I realized that it's a subjective quantity -- a property of the observer, not the observed.

The entropy of a variable X is the amount of information required to drive the observer's uncertainty about the value of X to zero. As a correlate, your uncertainty and mine about the value of the same variable X could be different. This is trivially true, as we could each have received different information that about X. H(X) should be H_{observer}(X), or even better, H_{observer, time}(X).

As clear as Shannon's work is in other respects, he glosses over this.

rachofsunshine
25 replies
21h29m

This doesn't really make entropy itself observer dependent. (Shannon) entropy is a property of a distribution. It's just that when you're measuring different observers' beliefs, you're looking at different distributions (which can have different entropies the same way they can have different means, variances, etc).

davidmnoll
14 replies
20h16m

Right but in chemistry class the way it’s taught via Gibbs free energy etc. makes it seem as if it’s an intrinsic property.

canjobear
10 replies
18h14m

Entropy in physics is usually the Shannon entropy of the probability distribution over system microstates given known temperature and pressure. If the system is in equilibrium then this is objective.

kergonath
9 replies
12h1m

Entropy in Physics is usually either the Boltzmann or Gibbs entropy, both of whom were dead before Shannon was born.

enugu
8 replies
11h33m

That's not a problem, as the GP's post is trying to state a mathematical relation not a historical attribution. Often newer concepts shed light on older ones. As Baez's article says, Gibbs entropy is Shannon's entropy of an associated distribution(multiplied by the constant k).

kergonath
7 replies
11h10m

It is a problem because all three come with a bagage. Almost none of the things discussed in this thread are invalid when discussing actual physical entropy even though the equations are superficially similar. And then there are lots of people being confidently wrong because they assume that it’s just one concept. It really is not.

enugu
6 replies
10h15m

Don't see how the connection is superficial. Even the classical macroscopic definition of entropy as ΔS=∫TdQ can be derived from the information theory perspective as Baez shows in article(using entropy maximizing distributions and Lagrange multipliers). If you have a more specific critique, it would be good to discuss.

im3w1l
5 replies
4h50m

In classical physics there is no real objective randomness. Particles have a defined position and momentum and those evolve deterministically. If you somehow learned these then the shannon entropy is zero. If entropy is zero then all kinds of things break down.

So now you are forced to consider e.g. temperature an impossibility without quantum-derived randomness, even though temperature does not really seem to be a quantum thing.

kgwgk
3 replies
3h55m

Particles have a defined position and momentum

Which we don’t know precisely. Entropy is about not knowing.

If you somehow learned these then the shannon entropy is zero.

Minus infinity. Entropy in classical statistical mechanics is proportional to the logarithm of the volume in phase space. (You need an appropriate extension of Shannon’s entropy to continuous distributions.)

So now you are forced to consider e.g. temperature an impossibility without quantum-derived randomness

Or you may study statistical mechanics :-)

kergonath
2 replies
3h25m

Which we don’t know precisely. Entropy is about not knowing.

No, it is not about not knowing. This is an instance of the intuition from Shannon’s entropy does not translate to statistical Physics.

It is about the number of possible microstates, which is completely different. In Physics, entropy is a property of a bit of matter, it is not related to the observer or their knowledge. We can measure the enthalpy change of a material sample and work out its entropy without knowing a thing about its structure.

Minus infinity. Entropy in classical statistical mechanics is proportional to the logarithm of the volume in phase space.

No, 0. In this case, there is a single state with p=1 and and S = - k Σ p ln(p) = 0.

This is the same if you consider the phase space because then it is reduced to a single point (you need a bit of distribution theory to prove it rigorously but it is somewhat intuitive).

The probability p of an microstate is always between 0 and 1, therefore p ln(p) is always negative and S is always positive.

You get the same using Boltzmann’s approach, in which case Ω = 1 and S = k ln(Ω) is also 0.

(You need an appropriate extension of Shannon’s entropy to continuous distributions.)

Gibbs’ entropy.

Or you may study statistical mechanics

Indeed.

nyssos
0 replies
2h11m

In Physics, entropy is a property of a bit of matter, it is not related to the observer or their knowledge. We can measure the enthalpy change of a material sample and work out its entropy without knowing a thing about its structure.

Enthalpy is also dependent on your choice of state variables, which is in turn dictated by which observables you want to make predictions about: whether two microstates are distinguishable, and thus whether the part of the same macrostate, depends on the tools you have for distinguishing them.

kgwgk
0 replies
3h16m

possible microstates

Conditional on the known macrostate. Because we don’t know the precise microstate - only which microstates are possible.

If your reasoning is that « experimental entropy can be measured so it’s not about that » then it’s not about macrostates and microstates either!

enugu
0 replies
2h5m

If entropy is zero then all kinds of things break down.

Entropy is a macroscopic variable and if you allow microscopic information, strange things can happen! One can move from a high entropy macrostate to a low entropy macrostate if you choose the initial microstate carefully. But this is not a reliable process which you can reproduce experimentally, ie. it is not a thermodynamic process.

A thermodynamics process P is something which takes a macrostate A to a macrostate B, independent of which microstate a0, a1, a2.. in A you started off with it. If the process depends on microstate, then it wouldn't be something we would recognize as we are looking from the macro perspective.

waveBidder
2 replies
18h23m

that's actually the normal view, with saying both info and stat mech entropy are the same is an outlier, most popularized by Jaynes.

kmeisthax
1 replies
14h59m

If information-theoretical and statistical mechanics entropies are NOT the same (or at least, deeply connected) then what stops us from having a little guy[0] sort all the particles in a gas to extract more energy from them?

[0] https://en.wikipedia.org/wiki/Maxwell%27s_demon

xdavidliu
0 replies
3h54m

Sounds like a non-sequitur to me; what are you implying about the Maxwell's demon thought experiment vs the comparison between Shannon and stat-mech entropy?

mitthrowaway2
8 replies
20h59m

Entropy is a property of a distribution, but since math does sometimes get applied, we also attach distributions to things (eg. the entropy of a random number generator, the entropy of a gas...). Then when we talk about the entropy of those things, those entropies are indeed subjective, because different subjects will attach different probability distributions to that system depending on their information about that system.

canjobear
5 replies
18h4m

Some probability distributions are objective. The probability that my random number generator gives me a certain number is given by a certain formula. Describing it with another distribution would be wrong.

Another example, if you have an electron in a superposition of half spin-up and half spin-down, then the probability to measure up is objectively 50%.

Another example, GPT-2 is a probability distribution on sequences of integers. You can download this probability distribution. It doesn't represent anyone's beliefs. The distribution has a certain entropy. That entropy is an objective property of the distribution.

mitthrowaway2
2 replies
15h49m

Of those, the quantum superposition is the only one that has a chance at being considered objective, and it's still only "objective" in the sense that (as far as we know) your description provided as much information as anyone can possibly have about it, so nobody can have a more-informed opinion and all subjects agree.

The others are both partial-information problems which are very sensitive to knowing certain hidden-state information. Your random number generator gives you a number that you didn't expect, and for which a formula describes your best guess based on available incomplete information, but the computer program that generated knew which one to choose and it would not have picked any other. Anyone who knew the hidden state of the RNG would also have assigned a different probability to that number being chosen.

cubefox
0 replies
6h22m

A more plausible way to argue for objectiveness is to say that some probability distributions are objectively more rational than others given the same information. E.g. when seeing a symmetrical die it would be irrational to give 5 a higher probability than the others. Or it seems irrational to believe that the sun will explode tomorrow.

canjobear
0 replies
2h49m

You might have some probability distribution in your head for what will come out of GPT-2 on your machine at a certain time, based on your knowledge of the random seed. But that is not the GPT-2 probability distribution, which is objectively defined by model weights that you can download, and which does not correspond to anyone’s beliefs.

financltravsty
1 replies
5h41m

The probability distribution is subjective for both parts -- because it, once again, depends on the observer observing the events in order to build a probability distribution.

E.g. your random number generator generates 1, 5, 7, 8, 3 when you run it. It generates 4, 8, 8, 2, 5 when I run it. I.e. we have received different information about the random number generator to build our subjective probability distributions. The level of entropy of our probability distributions is high because we have so little information to be certain about the representativeness of our distribution sample.

If we continue running our random number generator for a while, we will gather more information, thus reducing entropy, and our probability distributions will both start converging towards an objective "truth." If we ran our random number generators for a theoretically infinite amount of time, we will have reduced entropy to 0 and have a perfect and objective probability distribution.

But this is impossible.

canjobear
0 replies
2h53m

Would you say that all claims about the world are subjective, because they have to be based on someone’s observations?

For example my cat weighs 13 pounds. That seems objective, in the sense that if two people disagree, only one can be right. But the claim is based on my observations. I think your logic leads us to deny that anything is objective.

stergios
1 replies
19h51m

"Entropy is a property of matter that measures the degree of randomization or disorder at the microscopic level", at least when considering the second law.

mitthrowaway2
0 replies
19h31m

Right, but the very interesting thing is it turns out that what's random to me might not be random to you! And the reason that "microscopic" is included is because that's a shorthand for "information you probably don't have about a system, because your eyes aren't that good, or even if they are, your brain ignored the fine details anyway."

IIAOPSW
0 replies
15h17m

Yeah but distributions are just the accounting tools to keep track of your entropy. If you are missing one bit of information about a system, your understanding of the system is some distribution with one bit of entropy. Like the original comment said, the entropy is the number of bits needed to fill in the unknowns and bring the uncertainty down to zero. Your coin flips may be unknown in advance to you, and thus you model it as a 50/50 distribution, but in a deterministic universe the bits were present all along.

canjobear
9 replies
17h17m

What's often lost in the discussions about whether entropy is subjective or objective is that, if you dig a little deeper, information theory gives you powerful tools for relating the objective and the subjective.

Consider cross entropy of two distributions H[p, q] = -Σ p_i log q_i. For example maybe p is the real frequency distribution over outcomes from rolling some dice, and q is your belief distribution. You can see the p_i as representing the objective probabilities (sampled by actually rolling the dice) and the q_i as your subjective probabilities. The cross entropy is measuring something like how surprised you are on average when you observe an outcome.

The interesting thing is that H[p, p] <= H[p, q], which means that if your belief distribution is wrong, your cross entropy will be higher than it would be if you had the right beliefs, q=p. This is guaranteed by the concavity of the logarithm. This gives you a way to compare beliefs: whichever q gets the lowest H[p,q] is closer to the truth.

You can even break cross entropy into two parts, corresponding to two kinds of uncertainty: H[p, q] = H[p] + D[q||p]. The first term is the entropy of p and it is the aleatoric uncertainty, the inherent randomness in the phenomenon you are trying to model. The second term is KL divergence and it tells you how much additional uncertainty you have as the result of having wrong beliefs, which you could call epistemic uncertainty.

bubblyworld
8 replies
12h23m

Thanks, that's an interesting perspective. It also highlights one of the weak points in the concept, I think, which is that this is only a tool for updating beliefs to the extent that the underlying probability space ("ontology" in this analogy) can actually "model" the phenomenon correctly!

It doesn't seem to shed much light on when or how you could update the underlying probability space itself (or when to change your ontology in the belief setting).

_hark
2 replies
4h59m

I think what you're getting at is the construction of the sample space - the space of outcomes over which we define the probability measure (e.g. {H,T} for a coin, or {1,2,3,4,5,6} for a die).

Let's consider two possibilities:

1. Our sample space is "incomplete"

2. Our sample space is too "coarse"

Let's discuss 1 first. Imagine I have a special die that has a hidden binary state which I can control, which forces the die to come up either even or odd. If your sample space is only which side faces up, and I randomize the hidden state appropriately, it appears like a normal die. If your sample space is enlarged to include the hidden state, the entropy of each roll is reduced by one bit. You will not be able to distinguish between a truly random coin and a coin with a hidden state if your sample space is incomplete. Is this the point you were making?

On 2: Now let's imagine I can only observe whether the die comes up even or odd. This is a coarse-graining of the sample space (we get strictly less information - or, we only get some "macro" information). Of course, a coarse-grained sample space is necessarily an incomplete one! We can imagine comparing the outcomes from a normal die, to one which with equal probability rolls an even or odd number, except it cycles through the microstates deterministically e.g. equal chance of {odd, even}, but given that outcome, always goes to next in sequence {(1->3->5), (2->4->6)}.

Incomplete or coarse sample spaces can indeed prevent us from inferring the underlying dynamics. Many processes can have the same apparent entropy on our sample space from radically different underlying processes.

bubblyworld
1 replies
3h23m

Right, this is exactly what I'm getting at - learning a distribution over a fixed sample space can be done with Bayesian methods, or entropy-based methods like the OP suggested, but I'm wondering if there are methods that can automatically adjust the sample space as well.

For well-defined mathematical problems like dice rolling and fixed classical mechanics scenarios and such, you don't need this I guess, but for any real-world problem I imagine half the problem is figuring out a good sample space to begin with. This kind of thing must have been studied already, I just don't know what to look for!

There are some analogies to algorithms like NEAT, which automatically evolves a neural network architecture while training. But that's obviously a very different context.

_hark
0 replies
29m

We could discuss completeness of the sample space, and we can also discuss completeness of the hypothesis space.

In Solomonoff Induction, which purports to be a theory of universal inductive inference, the "complete hypothesis space" consists of all computable programs (note that all current theories of physics are computable, so this hypothesis space is very general). Then induction is performed by keeping all programs consistent with the observations, weighted by 2 terms: the programs prior likelihood, and the probability that program assigns to the observations (the programs can be deterministic and assign probability 1).

The "prior likelihood" in Solomonoff Induction is the program's complexity (well, 2^(-Complexity), where the complexity is the length of the shortest representation of that program.

Altogether, the procedure looks like: maintain a belief which is a mixture of all programs consistent with the observations, weighted by their complexity and the likelihood they assign to the data. Of course, this procedure is still limited by the sample/observation space!

That's our best formal theory of induction in a nutshell.

canjobear
1 replies
2h31m

This kind of thinking will lead you to ideas like algorithmic probability, where distributions are defined using universal Turing machines that could model anything.

bubblyworld
0 replies
1h17m

Amazing! I had actually heard about solomonoff induction before but my brain didn't make the connection. Thanks for the shortcut =)

bsmith
1 replies
10h58m

Couldn't you just add a control (PID/Kalman filter/etc) to coverage on a stability of some local "most" truth?

bubblyworld
0 replies
8h22m

Could you elaborate? To be honest I have no idea what that means.

tel
0 replies
1h2m

You can sort of do this over a suitably large (or infinite) family of models all mixed, but from an epistemological POV that’s pretty unsatisfying.

From a practical POV it’s pretty useful and common (if you allow it to describe non- and semi-parametric models too).

dist-epoch
1 replies
21h27m

Trivial example: if you know the seed of a pseudo-random number generator, a sequence generated by it has very low entropy.

But if you don't know the seed, the entropy is very high.

rustcleaner
0 replies
20h20m

Theoretically, it's still only the entropy of the sneed-space + time-space it could have been running in, right?

JumpCrisscross
1 replies
21h32m

it's a subjective quantity -- a property of the observer, not the observed

Shannon's entropy is a property of the source-channel-receiver system.

glial
0 replies
20h25m

Can you explain this in more detail?

Entropy is calculated as a function of a probability distribution over possible messages or symbols. The sender might have a distribution P over possible symbols, and the receiver might have another distribution Q over possible symbols. Then the "true" distribution over possible symbols might be another distribution yet, call it R. The mismatch between these is what leads to various inefficiencies in coding, decoding, etc [1]. But both P and Q are beliefs about R -- that is, they are properties of observers.

[1] https://en.wikipedia.org/wiki/Kullback–Leibler_divergence#Co...

kragen
0 replies
5h16m

shannon entropy is subjective for bayesians and objective for frequentists

IIAOPSW
0 replies
15h25m

To shorten this for you with my own (identical) understanding: "entropy is just the name for the bits you don't have".

Entropy + Information = Total bits in a complete description.

CamperBob2
0 replies
14h26m

It's an objective quantity, but you have to be very precise in stating what the quantity describes.

Unbroken egg? Low entropy. There's only one way the egg can exist in an unbroken state, and that's it. You could represent the state of the egg with a single bit.

Broken egg? High entropy. There are an arbitrarily-large number of ways that the pieces of a broken egg could land.

A list of the locations and orientations of each piece of the broken egg, sorted by latitude, longitude, and compass bearing? Low entropy again; for any given instance of a broken egg, there's only one way that list can be written.

Zip up the list you made? High entropy again; the data in the .zip file is effectively random, and cannot be compressed significantly further. Until you unzip it again...

Likewise, if you had to transmit the (uncompressed) list over a bandwidth-limited channel. The person receiving the data can make no assumptions about its contents, so it might as well be random even though it has structure. Its entropy is effectively high again.

prof-dr-ir
25 replies
22h8m

If I would write a book with that title then I would get to the point a bit faster, probably as follows.

Entropy is just a number you can associate with a probability distribution. If the distribution is discrete, so you have a set p_i, i = 1..n, which are each positive and sum to 1, then the definition is:

S = - sum_i p_i log( p_i )

Mathematically we say that entropy is a real-valued function on the space of probability distributions. (Elementary exercises: show that S >= 0 and it is maximized on the uniform distribution.)

That is it. I think there is little need for all the mystery.

bubblyworld
6 replies
12h29m

Thanks for defining it rigorously. I think people are getting offended on John Baez's behalf because his book obviously covers a lot more - like why does this particular number seem to be so useful in so many different contexts? How could you have motivated it a priori? Etcetera, although I suspect you know all this already.

But I think you're right that a clear focus on the maths is useful for dispelling misconceptions about entropy.

kgwgk
5 replies
11h50m

Misconceptions about entropy are misconceptions about physics. You can’t dispell them focusing on the maths and ignoring the physics entirely - especially if you just write an equation without any conceptual discussion, not even mathematical.

bubblyworld
4 replies
8h25m

I didn't say to only focus on the mathematics. Obviously wherever you apply the concept (and it's applied to much more than physics) there will be other sources of confusion. But just knowing that entropy is a property of a distribution, not a state, already helps clarify your thinking.

For instance, you know that the question "what is the entropy of a broken egg?" is actually meaningless, because you haven't specified a distribution (or a set of micro/macro states in the stat mech formulation).

kgwgk
3 replies
6h8m

Ok, I don’t think we disagree. But knowing that entropy is a property of a distribution given by that equation is far from “being it” as a definition of the concept of entropy in physics.

Anyway, it seems that - like many others - I just misunderstood the “little need for all the mystery” remark.

prof-dr-ir
1 replies
5h5m

is far from “being it” as a definition of the concept of entropy in physics.

I simply do not understand why you say this. Entropy in physics is defined using exactly the same equation. The only thing I need to add is the choice of probability distribution (i.e. the choice of ensemble).

I really do not see a better "definition of the concept of entropy in physics".

(For quantum systems one can nitpick a bit about density matrices, but in my view that is merely a technicality on how to extend probability distributions to Hilbert spaces.)

kgwgk
0 replies
3h59m

I’d say that the concept of entropy “in physics” is about (even better: starts with) the choice of a probability distribution. Without that you have just a number associated with each probability distribution - distributions without any physical meaning so those numbers won’t have any physical meaning either.

But that’s fine, I accept that you may think that it’s just a little detail.

(Quantum mechanics has no mystery either.

ih/2pi dA/dt = AH - HA

That’s it. The only thing one needs to add is a choice of operators.)

bubblyworld
0 replies
5h16m

Right, I see what you're saying. I agree that there is a lot of subtlety in the way entropy is actually used in practice.

mitthrowaway2
3 replies
20h56m

So the only thing you need to know about entropy is that it's a real-valued number you can associate with a probability distribution? And that's it? I disagree. There are several numbers that can be associated with probability distribution, and entropy is an especially useful one, but to understand why entropy is useful, or why you'd use that function instead of a different one, you'd need to know a few more things than just what you've written here.

prof-dr-ir
0 replies
9h9m

Of course that is not my statement. See all my other replies to identical misinterpretations of my comment.

Maxatar
0 replies
18h43m

Exactly, saying that's all there is to know about entropy is like saying all you need to know about chess are the rules and all you need to know about programming is the syntax/semantics.

Knowing the plain definition or the rules is nothing but a superficial understanding of the subject. Knowing how to use the rules to actually do something meaningful, having a strategy, that's where meaningful knowledge lies.

FabHK
0 replies
12h18m

In particular, the expectation (or variance) of a real-valued random variable can also be seen as "a real-valued number you can associate with a probability distribution".

Thus, GP's statement is basically: "entropy is like expectation, but different".

kgwgk
3 replies
22h5m

That covers one and a half of the twelve points he discusses.

prof-dr-ir
2 replies
21h56m

Correct! And it took me just one paragraph, not the 18 pages of meandering (and I think confusing) text that it takes the author of the pdf to introduce the same idea.

kgwgk
1 replies
21h41m

You didn’t introduce any idea. You said it’s “just a number” and wrote down a formula without any explanation or justification.

I concede that it was much shorter though. Well done!

bdjsiqoocwk
0 replies
20h14m

Haha you reminded me of that idea in software engineering that "it's easy to make an algorithm faster if you accept that at times it might output the wrong result; in fact you can make infinitely fast"

rachofsunshine
2 replies
21h37m

The problem is that this doesn't get at many of the intuitive properties of entropy.

A different explanation (based on macro- and micro-states) makes it intuitively obvious why entropy is non-decreasing with time or, with a little more depth, what entropy has to do with temperature.

prof-dr-ir
0 replies
21h6m

The above evidently only suffices as a definition, not as an entire course. My point was just that I don't think any other introduction beats this one, especially for a book with the given title.

In particular it has always been my starting point whenever I introduce (the entropy of) macro- and micro-states in my statistical physics course.

mjw_byrne
0 replies
17h56m

That doesn't strike me as a problem. Definitions are often highly abstract and counterintuitive, with much study required to understand at an intuitive level what motivates them. Rigour and intuition are often competing concerns, and I think definitions should favour the former. The definition of compactness in topology, or indeed just the definition of a topological space, are examples of this - at face value, they're bizarre. You have to muck around a fair bit to understand why they cut so brilliantly to the heart of the thing.

nabla9
1 replies
21h21m

Everyone who sees that formula can immediately see that it leads to principle of maximum entropy.

Just like everyone seeing Maxwell's equations can immediately see that you can derive the the speed of light classically.

Oh dear. The joy of explaining the little you know.

prof-dr-ir
0 replies
20h41m

As of this moment there are six other top-level comments which each try to define entropy, and frankly they are all wrong, circular, or incomplete. Clearly the very definition of entropy is confusing, and the definition is what my comment provides.

I never said that all the other properties of entropy are now immediately visible. Instead I think it is the only universal starting point of any reasonable discussion or course on the subject.

And lastly I am frankly getting discouraged by all the dismissive responses. So this will be my last comment for the day, and I will leave you in the careful hands of, say, the six other people who are obviously so extremely knowledgeable about this topic. /s

mensetmanusman
1 replies
19h7m

Don’t forget it’s the only measure of the arrow of time.

kgwgk
0 replies
8h55m

One could also say that it’s just a consequence of the passage of time (as in getting away from a boundary condition). The decay of radioactive atoms is also a measure of the arrow of time - of course we can say that’s the same thing.

CP violation may (or may not) be more relevant regarding the arrow of time.

senderista
0 replies
19h34m

Many students will want to know where the minus sign comes from. I like to write the formula instead as S = sum_i p_i log( 1 / p_i ), where (1 / p_i) is the "surprise" (i.e., expected number of trials before first success) associated with a given outcome (or symbol), and we average it over all outcomes (i.e., weight it by the probability of the outcome). We take the log of the "surprise" because entropy is an extensive quantity, so we want it to be additive.

klysm
0 replies
13h13m

The definition by itself without intuition of application is of little use

kaashif
0 replies
17h17m

That definition is on page 18, I agree it could've been reached a bit faster but a lot of the preceding material is motivation, puzzles, and examples.

This definition isn't the end goal, the physics things are.

Jun8
25 replies
22h56m

A well known anecdote reported by Shannon:

"My greatest concern was what to call it. I thought of calling it 'information,' but the word was overly used, so I decided to call it 'uncertainty.' When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, 'You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.'"

See the answers to this MathOverflow SE question (https://mathoverflow.net/questions/403036/john-von-neumanns-...) for references on the discussion whether Shannon's entropy is the same as the one from thermodynamics.

BigParm
24 replies
22h27m

Von Neumann was the king of kings

tonetegeatinst
17 replies
18h40m

Its odd...as someone interested but not fully into the sciences I see his name pop up everywhere.

bee_rider
11 replies
18h4m

He was really brilliant, made contributions all over the place in the math/physics/tech field, and had a sort of wild and quirky personality that people love telling stories about.

A funny quote about him from a Edward “a guy with multiple equations named after him” Teller:

Edward Teller observed "von Neumann would carry on a conversation with my 3-year-old son, and the two of them would talk as equals, and I sometimes wondered if he used the same principle when he talked to the rest of us."
strogonoff
10 replies
10h9m

Are there many von-Neumann-like multidisciplinaries nowadays? It feels like unless one is razor sharp fully into one field one is not to be treated seriously by those who made careers in it (and who have the last word on it).

bee_rider
4 replies
6h20m

I think there are none. The world has gotten too complicated for that. It was early days in quantum physics, information theory, and computer science. I don’t think it is early days in anything that consequential anymore.

adrianN
3 replies
5h32m

It’s the early days in a lot of fields, but they tend to be fiendishly difficult like molecular biology or neuroscience.

ricksunny
0 replies
2h58m

More than that, as professionals' career paths in fields develop, the organisations they work for specialize, becoming less amenable to the generalist. ('Why should we hire this mathematician who is also an expert in legal research? Their attention is probably divided, and meanwhile we have a 100% mathematician in the candidate pool fresh from an expensive dedicated PhD program with a growing family to feed.')

I'm obviously using the archetype of Leibniz here as an example but pick your favorite polymath.

bee_rider
0 replies
2h42m

Are they fiendishly difficult or do we just need a von Neumann to come along and do what he did for quantum mechanics to them?

Salgat
0 replies
2h24m

Centuries ago, the limitation of most knowledge was the difficulty in discovery; once known, it was accessible to most scholars. Take Calculus, which is taught in every high school in America. The problem is, we're getting to a point where new fields are built on such extreme requirements, that even the known knowledge is extremely hard for talented university students to learn, let alone what is required to discover and advance that field. Until we are able to augment human intelligence, the days of the polymath advancing multiple fields are mostly over. I would also argue that the standards for peer reviewed whitepapers and obtaining PhDs has significantly dropped (due to the incentive structure to spam as many papers as possible), which is only hurting the advancement of knowledge.

i_am_proteus
3 replies
9h0m

There have been a very small number of thinkers as publicly accomplished as von Neumann ever. One other who comes to mind is Carl F. Gauss.

strogonoff
0 replies
6h21m

Is it fair to say that the number of publicly accomplished multidisciplinaries alive at a particular moment is not rising as it may be expected, proportionally to the total number of suitably educated people?

passion__desire
0 replies
1h27m

Genius Edward Teller Describes 1950s Genius John Von Neumann

https://youtu.be/Oh31I1F2vds?t=189 Describes Von Neumann's final days struggle when he couldn't think. Thinking, an activity which he loved the most.

djd3
0 replies
2h1m

Euler.

JVM was one of the smartest ever, but Euler was there centuries before and shows up in so many places.

If I had a Time Machine I'd love to get those two together for a stiff drink and a banter.

lachlan_gray
0 replies
16m

IMO they do exist, but the popular attitude that it's not possible anymore is the issue, not a lack of genius. If everyone has a built in assumption that it can't happen anymore, then we will naturally prune away social pathways that enable it.

farias0
2 replies
18h6m

I've seen many people arguing he's the most intelligent person that ever lived

wrycoder
1 replies
17h32m

Some say Hungarians are actually aliens.

zeristor
3 replies
9h40m

I was hoping the Wikipedia might explain why this might have been.

bglazer
1 replies
2h33m

Emil Kirkegaard is a self-described white nationalist eugenicist who thinks the age of consent is too high. I wouldn't trust anything he has to say.

YeGoblynQueenne
0 replies
1h36m

No need for ad hominems. This suffices to place doubt on the article's premises (and therefore any conclusion):

> This hasn’t been strictly shown mathematically, but I think it is true.
illuminant
19 replies
23h21m

Entropy is the distribution of potential over negative potential.

This could be said "the distribution of what ever may be over the surface area of where it may be."

This is erroneously taught in conventional information theory as "the number of configurations in a system" or the available information that has yet to be retrieved. Entropy includes the unforseen, and out of scope.

Entropy is merely the predisposition to flow from high to low pressure (potential). That is it. Information is a form of potential.

Philosophically what are entropy's guarantees?

- That there will always be a super-scope, which may interfere in ways unanticipated;

- everything decays the only mystery is when and how.

mwbajor
10 replies
22h55m

All definitions of entropy stem from one central, universal definition: Entropy is the amount of energy unable to be used for useful work. Or better put grammatically: entropy describes the effect that not all energy consumed can be used for work.

ajkjk
7 replies
22h54m

There's a good case to be made that the information-theoretic definition of entropy is the most fundamental one, and the version that shows up in physics is just that concept as applied to physics.

galaxyLogic
4 replies
21h32m

That would mean that information-theory is not part of physics, right? So, Information Theory and Entropy, are part of metaphysics?

ajkjk
3 replies
19h49m

Well it's part of math, which physics is already based on.

Whereas metaphysics is, imo, "stuff that's made up and doesn't matter". Probably not the most standard take.

galaxyLogic
2 replies
17h55m

I'm wondering, isn't Information Theory as much part of physics as Thermodynamics is?

kgwgk
0 replies
12h58m

Would you say that Geometry is as much a part of physics as Optics is?

ajkjk
0 replies
15h52m

Not really. Information theory applies to anything probability applies to, including many situations that aren't "physics" per se. For instance it has a lot to do with algorithms and data as well. I think of it as being at the level of geometry and calculus.

rimunroe
0 replies
22h46m

My favorite course I took as part of my physics degree was statistical mechanics. It leaned way closer to information theory than I would have expected going in, but in retrospect should have been obvious.

Unrelated: my favorite bit from any physics book is probably still the introduction of the first chapter of "States of Matter" by David Goodstein: "Ludwig Boltzmann, who spent much of his life studying statistical mechanics, died in 1906, by his own hand. Paul Ehrenfest, carrying on the work, died similarly in 1933. Now it is our turn to study statistical mechanics."

imtringued
0 replies
19h39m

Yeah, people seemingly misunderstand that the entropy applied to thermodynamics is simply an aggregate statistic that summarizes the complex state of the thermodynamic system as a single real number.

The fact that entropy always rises etc, has nothing to do with the statistical concept of entropy itself. It simply is an easier way to express the physics concept that individual atoms spread out their kinetic energy across a large volume.

ziofill
0 replies
22h23m

I think what you describe is the application of entropy in the thermodynamic setting, which doesn't apply to "all definitions".

mitthrowaway2
0 replies
21h48m

This definition is far from universal.

ziofill
4 replies
22h54m

Entropy includes the unforseen, and out of scope.

Mmh, no it doesn't. You need to define your state space, otherwise it's an undefined quantity.

illuminant
2 replies
21h27m

You are referring to the conceptual device you believe bongs to you and your equations. Entropy creates attraction and repulsion, even causing working bias. We rely upon it for our system functions.

Undefined is uncertainty is entropic.

senderista
0 replies
19h6m

bongs

indeed

fermisea
0 replies
20h37m

Entropy is a measure, it doesn't create anything. This is highly misleading.

kevindamm
0 replies
22h43m

But it is possible to account for the unforseen (or out-of-vocabulary) by, for example, a Good-Turing estimate. This satisfies your demand for a fully defined state space while also being consistent with GP's definition.

eoverride
1 replies
22h18m

This answer is as confident as it's wrong and full of gibberish.

Entropy is not a "distribution”, it's a functional that maps a probability distribution to a scalar value, i.e. a single number.

It's the mean log-probability of a distribution.

It's an elementary statistical concept, independent of physical concepts like “pressure”, “potential”, and so on.

illuminant
0 replies
21h31m

It sounds like log-probability is the manifold surface area.

Distribution of potential over negative potential. Negative potential is the "surface area", and available potential distributes itself "geometrically". All this is iterative obviously, some periodicity set by universal speed limit.

It really doesn't sound like you disagree with me.

axblount
0 replies
22h51m

Baez seems to use the definition you call erroneous: "It’s easy to wax poetic about entropy, but what is it? I claim it’s the amount of information we don’t know about a situation, which in principle we could learn."

dekhn
10 replies
22h43m

I really liked the approach my stat mech teacher used. In nearly all situations, entropy just ends up being the log of the number of ways a system can be arranged (https://en.wikipedia.org/wiki/Boltzmann%27s_entropy_formula) although I found it easiest to think in terms of pairs of dice rolls.

petsfed
5 replies
21h21m

And this is what I prefer too, although with the clarification that its the number of ways that a system can be arranged without changing its macroscopic properties.

Its, unfortunately, not very compatible with Shannon's usage in any but the shallowest sense, which is why it stays firmly in the land of physics.

enugu
2 replies
11h16m

Assuming each of the N microstates for a given macrostate are equally possible with probability p=1/N, the Shannon Entropy is -Σp.log(p) = -N.p.log(p)=-1.log(1/N)=log(N), which is the physics interpretation.

In the continuous version, you would get log(V) where V is the volume in phase space occupied by the microstates for a given macrostate.

Liouville's theorem that the volume is conserved in phase space implies that any macroscopic process can only move all the microstates from a macrostate A into a macrostate B only if the volume of B is bigger than the volume of A. This implies that the entropy of B should be bigger than the entropy of A which is the Second Law.

cubefox
1 replies
5h59m

The second law of thermodynamics is time-asymmetric, but the fundamental physical laws are time-symmetric, so from them you can only predict that the entropy of B should be bigger than the entropy of A irrespective of whether B is in the future or the past of A. You need the additional assumption (Past Hypothesis) that the universe started in a low entropy state in order to get the second law of thermodynamics.

If our goal is to predict the future, it suffices to choose a distribution that is uniform in the Liouville measure given to us by classical mechanics (or its quantum analogue). If we want to reconstruct the past, in contrast, we need to conditionalize over trajectories that also started in a low-entropy past state — that the “Past Hypothesis” that is required to get stat mech off the ground in a world governed by time-symmetric fundamental laws.

https://www.preposterousuniverse.com/blog/2013/07/09/cosmolo...

kgwgk
0 replies
5h6m

The second law of thermodynamics is about systems that are well described by a small set of macroscopic variables. The evolution of an initial macrostate prepared by an experimenter who can control only the macrovariables is reproducible. When a thermodynamical system is prepared in such a reproducible way the preparation is happening in the past, by definition.

The second law is about how part of the information that we had about a system - constrained to be in a macrostate - is “lost” when we “forget” the previous state and describe it using just the current macrostate. We know more precisely the past than the future - the previous state is in the past by definition.

kgwgk
1 replies
10h44m

not very compatible with Shannon's usage in any but the shallowest sense

The connection is not so shallow, there are entire books based on it.

“The concept of information, intimately connected with that of probability, gives indeed insight on questions of statistical mechanics such as the meaning of irreversibility. This concept was introduced in statistical physics by Brillouin (1956) and Jaynes (1957) soon after its discovery by Shannon in 1948 (Shannon and Weaver, 1949). An immense literature has since then been published, ranging from research articles to textbooks. The variety of topics that belong to this field of science makes it impossible to give here a bibliography, and special searches are necessary for deepening the understanding of one or another aspect. For tutorial introductions, somewhat more detailed than the present one, see R. Balian (1991-92; 2004).”

https://arxiv.org/pdf/cond-mat/0501322

petsfed
0 replies
34m

I don't dispute that the math is compatible. The problem is the interpretation thereof. When I say "shallowest", I mean the implications of each are very different.

Insofar as I'm aware, there is no information-theoretic equivalent to the 2nd or 3rd laws of thermodynamics, so the intuition a student works up from physics about how and why entropy matters just doesn't transfer. Likewise, even if an information science student is well versed in the concept of configuration entropy, that's 15 minutes of one lecture in statistical thermodynamics. There's still the rest of the course to consider.

Lichtso
1 replies
19h20m

The "can be arranged" is the tricky part. E.g. you might know from context that some states are impossible (where the probability distribution is zero), even though they combinatorially exist. That changes the entropy to you.

That is why information and entropy are different things. Entropy is what you know you do not know. That knowledge of the magnitude of the unknown is what is being quantified.

Also, the point where I think the article is wrong (or not concise enough) as it would include the unknown unknowns, which are not entropy IMO:

I claim it’s the amount of information we don’t know about a situation
slashdave
0 replies
18h33m

Exactly. If you want to reuse the term "entropy" in information theory, then fine. Just stop trying to make a physical analogy. It's not rigorous.

abetusk
0 replies
20h9m

Also known as "the number of bits to describe a system". For example, 2^N equally probable states, N bits to describe each state.

utkarsh858
2 replies
17h38m

I sometimes ponder where new entropy/randomness is coming from, like if we take the earliest state of universe as an infinitely dense point particle which expanded. So there must be some randomness or say variety which led it to expand in a non uniform way which led to the dominance of matter over anti-matter, or creation of galaxies, clusters etc. If we take an isolated system in which certain static particles are present, will there be the case that a small subset of the particles will get motion and this introduce entropy? Can entropy be induced automatically, atleast on a quantum level? If anyone can help me explain that it will be very helpful and thus can help explain origin of universe in a better way.

pseidemann
0 replies
5h44m

I saw this video, which explained it for me (it's german, maybe the automatic subtitles will work for you): https://www.youtube.com/watch?v=hrJViSH6Klo

He argues that the randomness you are looking for comes from quantum fluctuations, and if this randomness did not exist, the universe would probably never have "happened".

empath75
0 replies
2h56m

Symmetry breaking is the general phenomenon that underlies most of that.

The classic example is this:

Imagine you have a perfectly symmetrical sombrero[1], and there's a ball balanced on top of the middle of the hat. There's no preferred direction it should fall in, but it's _unstable_. Any perturbation will make it roll down hill and come to rest in a stable configuration on the brim of the hat. The symmetry of the original configuration is now broken, but it's stable.

1: https://m.media-amazon.com/images/I/61M0LFKjI9L.__AC_SX300_S...

tasteslikenoise
2 replies
17h36m

I've always favored this down-to-earth characterization of the entropy of a discrete probability distribution. (I'm a big fan of John Baez's writing, but I was surprised glancing through the PDF to find that he doesn't seem to mention this viewpoint.)

Think of the distribution as a histogram over some bins. Then, the entropy is a measurement of, if I throw many many balls at random into those bins, the probability that the distribution of balls over bins ends up looking like that histogram. What you usually expect to see is a uniform distribution of balls over bins, so the entropy measures the probability of other rare events (in the language of probability theory, "large deviations" from that typical behavior).

More specifically, if P = (P1, ..., Pk) is some distribution, then the probability that throwing N balls (for N very large) gives a histogram looking like P is about 2^(-N * [log(k) - H(P)]), where H(P) is the entropy. When P is the uniform distribution, then H(P) = log(k), the exponent is zero, and the estimate is 1, which says that by far the most likely histogram is the uniform one. That is the largest possible entropy, so any other histogram has probability 2^(-c*N) of appearing for some c > 0, i.e., is very unlikely and exponentially moreso the more balls we throw, but the entropy measures just how much. "Less uniform" distributions are less likely, so the entropy also measures a certain notion of uniformity. In large deviations theory this specific claim is called "Sanov's theorem" and the role the entropy plays is that of a "rate function."

The counting interpretation of entropy that some people are talking about is related, at least at a high level, because the probability in Sanov's theorem is the number of outcomes that "look like P" divided by the total number, so the numerator there is indeed counting the number of configurations (in this case of balls and bins) having a particular property (in this case looking like P).

There are lots of equivalent definitions and they have different virtues, generalizations, etc, but I find this one especially helpful for dispelling the air of mystery around entropy.

vinnyvichy
1 replies
16h49m

Hey did you want to say relative entropy ~ rate function ~ KL divergence. Might be more familiar to ML enthusiasts here, get them to be curious about Sanov or large deviations.

tasteslikenoise
0 replies
15h58m

That's right, here log(k) - H(p) is really the relative entropy (or KL divergence) between p and the uniform distribution, and all the same stuff is true for a different "reference distribution" of the probabilities of balls landing in each bin.

For discrete distributions the "absolute entropy" (just sum of -p log(p) as it shows up in Shannon entropy or statistical mechanics) is in this way really a special case of relative entropy. For continuous distributions, say over real numbers, the analogous quantity (integral of -p log(p)) isn't a relative entropy since there's no "uniform distribution over all real numbers". This still plays an important role in various situations and calculations...but, at least to my mind, it's a formally similar but conceptually separate object.

drojas
2 replies
22h34m

My definition: Entropy is a measure of the accumulation of non-reversible energy transfers.

Side note: All reversible energy transfers involve an increase in potential energy. All non-reversible energy transfers involve a decrease in potential energy.

space_oddity
0 replies
5h39m

However, while your definition effectively captures a significant aspect of entropy, it might be somewhat limited in scope

snarkconjecture
0 replies
22h22m

That definition doesn't work well because you can have changes in entropy even if no energy is transferred, e.g. by exchanging some other conserved quantity.

The side note is wrong in letter and spirit; turning potential energy into heat is one way for something to be irreversible, but neither of those statements is true.

For example, consider an iron ball being thrown sideways. It hits a pile of sand and stops. The iron ball is not affected structurally, but its kinetic energy is transferred (almost entirely) to heat energy. If the ball is thrown slightly upwards, potential energy increases but the process is still irreversible.

Also, the changes of potential energy in corresponding parts of two Carnot cycles are directionally the same, even if one is ideal (reversible) and one is not (irreversible).

tsoukase
1 replies
1h30m

After years of thought I dare to say the 2nd TL is a tautology. Entropy is increasing means every system tends to higher probability means the most probable is the most probable.

tel
0 replies
57m

I think that’s right, though it’s non-obvious that more probable systems are disordered. At least as non-obvious as Pascal’s triangle is.

Which is to say, worth saying from a first principles POV, but not all that startling.

ooterness
1 replies
22h23m

For information theory, I've always thought of entropy as follows:

"If you had a really smart compression algorithm, how many bits would it take to accurately represent this file?"

i.e., Highly repetitive inputs compress well because they don't have much entropy per bit. Modern compression algorithms are good enough on most data to be used as a reasonable approximation for the true entropy.

space_oddity
0 replies
6h31m

The essence of entropy as a measure of information content

foobarbecue
1 replies
12h56m

How do you get to the actual book / tweets? The link just takes me back to the forward...

ctafur
1 replies
11h11m

The way I understand it is with an analogy to probability. To me, events are to microscopic states like random variable is to entropy.

ctafur
0 replies
10h58m

My first contact with entropy was in chemistry and thermodynamics and I didn't get it. Actually I didn't get anything from engineering thermodynamics books such as Çengel and so.

You must go to statistical mechanics or information theory to understand entropy. Or trying these PRICELESS NOTES from Prof. Suo: https://docs.google.com/document/d/1UMwpoDRZLlawWlL2Dz6YEomy...

GoblinSlayer
1 replies
12h2m

There's fundamental nature of entropy, but as usual it's not very enlightening for poor monkey brain, so to explain you need to enumerate all its high level behavior, but its high level behavior is accidental and can't be summarized in a concise form.

space_oddity
0 replies
5h42m

This complexity underscores the richness of the concept

yellowcake0
0 replies
18h38m

Information entropy is literally the strict lower bound on how efficiently information can be communicated (expected number of transmitted bits) if the probability distribution which generates this information is known, that's it. Even in contexts such as calculating the information entropy of a bit string, or the English language, you're just taking this data and constructing some empirical probability distribution from it using the relative frequencies of zeros and ones or letters or n-grams or whatever, and then calculating the entropy of that distribution.

I can't say I'm overly fond of Baez's definition, but far be it from me to question someone of his stature.

vinnyvichy
0 replies
16h41m

The book might disappoint some..

I have largely avoided the second law of thermodynamics ... Thus, the aspects of entropy most beloved by physics popularizers will not be found here.

But personally, this bit is the most exciting to me.

I have tried to say as little as possible about quantum mechanics, to keep the physics prerequisites low. However, Planck’s constant shows up in the formulas for the entropy of the three classical systems mentioned above. The reason for this is fascinating: Planck’s constant provides a unit of volume in position-momentum space, which is necessary to define the entropy of these systems. Thus, we need a tiny bit of quantum mechanics to get a good approximate formula for the entropy of hydrogen, even if we are trying our best to treat this gas classically.
suoduandao3
0 replies
1h41m

I like the formulation of 'the amount of information we don't know about a system that we could in theory learn'. I'm surprised there's no mention of the Copenhagen interpretation's interaction with this definition, under a lot of QM theories 'unavailable information' is different from available information.

jsomedon
0 replies
8h1m

Am I only one that can't download the pdf, or is the file server down? I can see the blog page but when I try downloading the ebook it just doesn't work..

If the file server is down.. anyone could upload the ebook for download?

eointierney
0 replies
22h3m

Ah JCB, how I love your writing, you are always so very generous.

Your This Week's Finds were a hugely enjoyable part of my undergraduate education and beyond.

Thank you again.

dmn322
0 replies
21h18m

This seems like a great resource for referencing the various definitions. I've tried my hand at developing an intuitive understanding: https://spacechimplives.substack.com/p/observers-and-entropy. TLDR - it's an artifact of the model we're using. In the thermodynamic definition, the energy accounted for in the terms of our model is information. The energy that's not is entropic energy. Hence why it's not "useable" energy, and the process isn't reversible.

ccosm
0 replies
3h23m

"I have largely avoided the second law of thermodynamics, which says that entropy always increases. While fascinating, this is so problematic that a good explanation would require another book!"

For those interested I am currently reading "Entropy Demystified" by Arieh Ben-Naim which tackles this side of things from much the same direction.

bdjsiqoocwk
0 replies
20h17m

Hmmm that list of things that contribute to entropy I've noticed omits particles which under "normal circumstances" on earth exist in bound states, for example it doesn't mentions W bosons or gluons. But in some parts of the universe they're not bound but in different state of matter, e.g. quark gluon plasma. I wonder how or if this was taken I to account.

arjunlol
0 replies
18h6m

ΔS = ΔQ/T