HN comments for: Simpson's paradox

r_thambapillai

31 replies

1d17h

2024-03-12 00:56:36 UTC

I once encountered this in the real world as a data analyst a long time ago. I was working at an e-commerce company, called The Hut Group, and the whole year our marketing team had been saying our marketing cost of goods sold (the percentage of our revenue we needed to spend on marketing) had been declining across every product category. But at year end, the execs were shocked to realize that our cost of goods sold had almost doubled, from 10% to nearly 20%.

The finance team had asked me to double check the marketing team's numbers, to see if there'd been some funny math in the reporting. But the marketing team were totally right, marketing spend across the three main categories - games, beauty, and nutrition had all fallen (~15% to ~10%, ~30% to ~25%, and ~50% to ~30% respectively). However, the mix of these product categories had shifted massively, with nutrition growing from roughly 10% of our total sales to now nearly 50%.

In net that meant that whilst the marketing team had gotten more cost-efficient at selling every individual product category, the growth in the nutrition industry had vastly outstripped the growth in all other categories, and since that was the highest individual category, the aggregate marketing costs % had gone up, even though the team had improved every category. I then had the fun job of explaining the Yule Simpson paradox to a bunch of accountants.

jldugger

20 replies

1d13h

2024-03-12 04:46:14 UTC

Pretty much every dataset I work with as an SRE is full of these paradoxes. One classic published example comes from Google:

A network engineer took a trip to Indonesia or something (can't find the citation to confirm the exact tale), noticed the service was slow, and when asking around everyone said "that's how its always been." Basically the local cellular networks are slow and off island fiber connects are saturated. Back at the office they decide to attack the problem by optimizing payload sizes. Does the work, reducing download sizes by half, and ships it. Latency metrics? Average and p95 latency actually increased after shipping the work to production.

How does an objectively good change make things worse? Well, the service had improved for those customers so much that they used it a lot more. Even with the lighter demand on bandwidth the network latency to the datacenter was worse than typical US customers, so as more of these people realized the service sucked way less, they used it more and drove the numbers up.

I have tons of these examples where a data team looks at a particular slice of request telemetry, and comes to a wrong conclusion because they didn't model enough of the system, or controlled for the wrong (or too many) variables. The worst ones the cyclic finger pointing situations that Simpson's paradox can produce: App developers blaming a regression on the server side component while the server team blames the app team, often because the server and app release schedules accidentally aligned too well. In this case we have canary data to exonerate our side of the equation, but sometimes the problem lies in even deeper spaces, like app updates from an entirely different app.

Tomte

11 replies

1d12h

2024-03-12 05:35:48 UTC

But your example isn‘t a case of Simpson‘s Paradox (which is purely statistical), but Jevons Paradox (which is about human behaviour and economics).

ploxiln

7 replies

1d11h

2024-03-12 07:02:26 UTC

If I recall the youtube slow-internet optimisation case correction, I think it is an example of Simpson's paradox. They made it faster for countries with fast internet, and faster for countries with slow internet, and then the average performance across all users/countries was slower, because now the countries with slow internet used youtube much more than before.

lordgrenville

6 replies

1d10h

2024-03-12 07:40:37 UTC

But the improvement induced the demand, which to my mind makes this different from Simpson's Paradox.

lern_too_spel

3 replies

1d9h

2024-03-12 08:45:26 UTC

Doesn't matter. That is not relevant to the paradox.

kgwgk

2 replies

1d7h

2024-03-12 11:14:18 UTC

How does "Average and p95 latency actually increased after shipping the work to production. How does an objectively good change make things worse?" relate to Simpson's paradox again?

ploxiln

1 replies

1d6h

2024-03-12 12:04:21 UTC

That's exactly it. After "shipping the work to production" (making it faster for everybody), the overall average and p95 got worse. Each sub-population experienced improvement: countries with fast internet got faster youtube, countries with slow internet got faster youtube. But the overall average and p95 got worse: overall average was slower youtube. Because now more users from the second sub-population bring the overall average speed down (or latency up). That's Simpson's paradox.

kgwgk

0 replies

1d5h

2024-03-12 12:28:30 UTC

Ah, you may be right. It's not clear in the story that "Average and p95 latency actually increased after shipping the work to production." means average of Indonesia and ex-Indonesia and not just Indonesian average.

bryanrasmussen

1 replies

1d9h

2024-03-12 08:45:46 UTC

I would say the improvement allowed the demand to be met, everybody wanted to use youtube, but few could.

Just like many people may want to eat a wide range of expensive tasty food, but have to make do with junk because it's what they can afford.

mFixman

0 replies

1d6h

2024-03-12 12:10:33 UTC

It would be Simpsons' Paradox if Google services in Indonesia were initially slow because Indonesians tend to use YouTube more often than lighter services.

There wasn't an error in the conclusions of the initial measuremen. It was the solution that had problems.

jldugger

1 replies

1d11h

2024-03-12 06:34:55 UTC

Good point! I'm just a humble Linux sysadmin dubbed "SRE" who slept through Stats for Engineers and now pays the price every week dealing with SWE eager to blame me for their mistakes.

roenxi

0 replies

19h34m

2024-03-12 22:50:14 UTC

You were right; that was a case of Simpson's paradox. Every category experienced a latency boost but the overall statistic worsened. Jevon's paradox is what caused the induced demand, but when the new usage data was gathered the initial review was an example of Simpson's paradox.

Effect of the change -> Jevon's paradox.

Measurement of Jevon's paradox -> Simpson's paradox (in this case, that isn't a general rule).

The fact that the two are easily linked is one of the reasons the statistical paradox is so common in practice.

lern_too_spel

0 replies

1d10h

2024-03-12 08:23:50 UTC

Latency improved for everyone, but overall average latency increased because usage increased faster in high latency areas. That's Simpson's Paradox. Simpson's Paradox doesn't care where the subpopulations you're measuring came from.

tetris11

5 replies

1d9h

2024-03-12 09:12:58 UTC

isn't that the "One More Lane, I Promise!" meme

wastewastewaste

4 replies

1d6h

2024-03-12 11:50:36 UTC

It is, but usually the meme misrepresents induced demand. While I don't like cars and we should focus on other infrastructure, adding a lane does help.

It does not reduce congestion, but it does now serve more people at this same current congestion level. And those people have come from somewhere. Sometimes from public transport, which isn't really good, but sometimes from some backwater road.

vidarh

2 replies

1d5h

2024-03-12 13:10:41 UTC

The bigger problem with induced demand is that it's often poor ROI to add that lane where the demand is highest.

That is, imagine you have a big city. You can add capacity for 1m extra people to travel to the city centre, where there's lots of congestion. Or you find ways to induce demand around the other limits of town, even town current demand is low there.

Odds are you'll pick the first, because it's "obvious" and doesn't require much thinking to see it'd help. But we really ought to look at cost-benefit of the second option too, because repeatedly inducing demand in the centre keeps driving up the incremental cost of further improvements, along plenty of other undesirable second order effects.

otherme123

1 replies

1d3h

2024-03-12 15:21:48 UTC

Adding lanes is like getting a bigger cache with the same throughput.

It's obvious at the supermarket: what goes faster, a single cashier processing four short lanes of 10 people with round robin, or two cashiers processing a single lane with 40 people?

Is the city center able to process 1m extra people? If not, it doesn't matter how many lanes you build.

vidarh

0 replies

1d2h

2024-03-12 15:48:45 UTC

Well you often can make it able to "process" 1m extra people: You can build overpasses, and tunnels, and taller buildings. But the cost-per-extra-person will tend to go up accordingly, to the point where you could spend an extraordinary amount attracting people out of the centre.

E.g. London's "Crossrail" / Elizabeth line cost $24 billion. Granted, it also allows some people to go through London faster, but I can't help to wonder what that money could've done if applied to attract businesses out of the centre instead. E.g. upgrading links between towns on the outskirts, upgrading town centres, and generally try to make it more attractive for businesses to be located further out.

Given the extraordinary costs it takes to do large infrastructure projects in London, I'd be very surprised if you couldn't get a higher return on investment that way, or by investing similar sums elsewhere in the UK entirely.

lmz

0 replies

1d6h

2024-03-12 12:04:11 UTC

Until more people choose to live further away because the commute is now tolerable with the extra lane (and it's cheaper), and then you're back to square one.

clemiclemen

1 replies

1d1h

2024-03-12 17:21:33 UTC

This reminds me of a similar story with YouTube [1] where improving the page weight decreased the metrics because more people with lower end connections could access the page.

Metrics interpretation is as important as the metrics themselves!

[1]: https://blog.chriszacharias.com/page-weight-matters

jldugger

0 replies

2024-03-12 17:33:21 UTC

That may be exactly the story I was thinking of, or perhaps the original of a story I encountered on a GCP cloud post or something.

konstantinua00

4 replies

1d12h

2024-03-12 05:45:14 UTC

every time I hear about examples of simpson in peactice, I don't get what lesson to learn

marketting team overoptimized, so non-nutrition demand fell?

drop nutrition from line of products, so that you're both efficient in products you do and overall?

these metrics are insufficient and it's better to look at gross change rather than ratios?

I have no idea

thih9

0 replies

1d12h

2024-03-12 05:59:18 UTC

I think the last one is closest. I’d go with: “finance team should look at the gross change”, if that’s what matters for them.

infogulch

0 replies

1d10h

2024-03-12 08:14:25 UTC

Maybe the lesson is to analyze different business units (product categories?) independently first, then the whole.

gen220

0 replies

1d1h

2024-03-12 16:48:52 UTC

IME, the "problem" (to the extent there is one) is almost always that the naïvely-chosen KPI metric wasn't specific enough.

Here's a recent example from a friend. You're a SaaS company, and your home page's load time is reported as slow. You set your KPI for the quarter to be "reduce p99 load time of the home page by 50%".

The load time is a function of customer size, so bigger customers = slower home page. It's actually a quadratic function. So the p99 of small customers is like the p50 of large customers. You have 20 small customers and 20 big customers.

That quarter, the sales team onboards 10 new tiny customers, and 10 big customers churn. It's the holiday season in your big customers' geo, so mostly small customers are using the platform. It's the busiest time of year for the small customers, so they're over-using the platform.

All these factors lead to p99 latency dropping by 60%, smashing the KPI goal. Bonuses all around, pats on the back. And no code changes needed, besides!

The solution is: choose a KPI that is tightly coupled to your problem, and not confounded with other variables.

In the above case, a better KPI would have been "p99 latency for large customers", because it is robust to the distribution of customer sizes across current users, churned users, and seasonal differences in usage.

brabel

0 replies

1d11h

2024-03-12 07:22:53 UTC

The article suggests an answer to your question, see the last sentence of the introduction:

"its lesson "isn't really to tell us which viewpoint to take but to insist that we keep both the parts and the whole in mind at once."

In the case above, they failed at "keeping the parts in mind" as clearly, the different ratios between different products was crucial.

Anon84

2 replies

1d14h

2024-03-12 03:30:28 UTC

It’s actually surprisingly common. You can even find it in “classical” toy datasets like Iris: https://github.com/DataForScience/Causality/blob/master/1.2%...

appplication

1 replies

1d12h

2024-03-12 05:35:01 UTC

Covid vaccination rates and deaths were rather famously subject to it. E.g. some combination of stats like “most covid deaths were vaccinated individuals”, “vaccination reduces death rate”, and “population segment with lowest vaccination rates has lowest covid death rates.” were all true at the same time.

onychomys

0 replies

1d5h

2024-03-12 13:25:04 UTC

Those aren't examples of Simpsons even taken together, but there was a famous (by which I mean it got a lot of press, including being written up in the Times and Post when it came out) study that showed that although every subgroup in Italian demographic data had lower CFRs than their Chinese counterparts, the Chinese group had a lower CFR when taken as a whole:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8791436/

throwaway98797

0 replies

1d17h

2024-03-12 01:03:42 UTC

it’s shocking that product mix wasn’t slide on reporting

but marketing selects for positivity not objectivity

the facts and only the facts that support what they do

dumb1224

0 replies

1d7h

2024-03-12 10:47:03 UTC

I thought it is pretty common to apply mixed / hierarchical linear models? I didn't study statistics but in our field of many problems of modelling biological effects we would do that.

E.g https://www.pymc.io/projects/examples/en/latest/generalized_...

kromem

21 replies

1d15h

2024-03-12 03:08:32 UTC

I absolutely love the Ellenberg quote:

Mathematician Jordan Ellenberg argues that Simpson's paradox is misnamed as "there's no contradiction involved, just two different ways to think about the same data" and suggests that its lesson "isn't really to tell us which viewpoint to take but to insist that we keep both the parts and the whole in mind at once."

Keeping multiple possibilities in mind at once was what allowed the Epicureans to determine survival of the fittest, trait inheritance from each parent, that light was made of discrete units that weighed very little and were moving very fast, and that in order for free will to exist the quanta making up matter had to have multiple possible results under the same governing physical laws and conditions - all several millennia before the scientific method independently found the same results.

It's a great analytical method, especially in data analysis as suggested here.

saghm

8 replies

1d5h

2024-03-12 13:17:38 UTC

Mathematician Jordan Ellenberg argues that Simpson's paradox is misnamed as "there's no contradiction involved, just two different ways to think about the same data"

Isn't this a bit of a misunderstanding on their part on the meaning of the word "paradox"? The fact that they're called paradoxes is that they go against initial intuition and _seem_ contradictory, not that they necessarily are. If anything, I'd guess that most of the named paradoxes turn out to not actually be contradictory because when something seems incorrect and actually is incorrect, it's a lot less likely to be interesting enough to give a name.

klodolph

7 replies

1d2h

2024-03-12 15:35:34 UTC

There are various categories of paradoxes and different ways people have categorized them.

Quine calls this one a “veridical paradox”, where it seems false but is true.

Example of a different types of paradox are: any proof that 1=0, Russell’s Paradox, and Zeno’s paradox. These are either false in some sense or used to illustrate fallacious reasoning.

missingrib

3 replies

1d2h

2024-03-12 16:01:24 UTC

I don't think Zeno's paradoxes have truly been proven false.

shawabawa3

2 replies

1d2h

2024-03-12 16:18:42 UTC

I don't think Zeno's paradoxes have truly been proven false.

Are you suggesting that there's a chance motion doesn't exist?

klodolph

1 replies

1d1h

2024-03-12 16:37:37 UTC

It also has not been proven that real, correct proofs for 1=0 do not exist. Paradoxes are not all about proofs.

edanm

0 replies

3h47m

2024-03-13 14:37:55 UTC

I'm fairly sure it has actually, under some axiom schemas.

jldugger

2 replies

1d2h

2024-03-12 16:15:46 UTC

“veridical paradox”, where it seems false but is true.

any proof that 1=0

So, 1=0 seems false but is true?

klodolph

1 replies

1d1h

2024-03-12 16:32:03 UTC

1=0 proofs are examples of a different type of paradox. It is an example of a paradox that does not seem true.

jldugger

0 replies

1d1h

2024-03-12 17:03:43 UTC

Ah, reading comprehension is hard. I thought you were listing different examples of the same paradox for some reason. Carry on.

brabel

6 replies

1d10h

2024-03-12 07:31:07 UTC

Sorry to nitpick, but "light was made of discrete units that weighed very little and were moving very fast" is not really correct.

First of all, light has exactly zero weight (only a massless particle can travel at exactly the speed of light, and at no other speed for that matter).

Secondly, you're leaving out the wave/particle duality of light, which sort of reminds the Simpson's paradox description of "just two different ways to think about the same data", without which you simply can't fully understand the behaviour of light (or of the statistical system you're looking at).

lupusreal

3 replies

1d8h

2024-03-12 10:12:31 UTC

Weight isn't mass; weight is the force acting on something due to gravity. Gravity effects light, albeit only by a little, so in this sense light has a small but nonzero weight.

brabel

2 replies

1d7h

2024-03-12 10:27:40 UTC

I don't believe that's a correct interpretation. The reason light bends in the presence of gravity is that space time itself is curved, and light follows a "straight line" on that curved space time.

Given weight is defined as `W=mg`, and `m` is `0` for light, light can't have any weight. I think the question is itself incorrect: you can't weigh light because light is not something you can "stop" and put on a balance.

The fact that gravity appears to "attract" light is an illusion. Light only has what is called "relativistic mass" which has very little to do with how we normally think of mass and weight.

This article explains it pretty well: https://science.howstuffworks.com/light-weigh.htm

topaz0

0 replies

1d5h

2024-03-12 12:47:21 UTC

the reason light bends in the presence of gravity ...

This is also why gravity bends the trajectories of massive particles, which also follow geodesics of the curved spacetime (in the absence of other forces).

im3w1l

0 replies

1d6h

2024-03-12 12:00:40 UTC

Light bends spacetime too.

noam_k

0 replies

1d2h

2024-03-12 15:32:48 UTC

Well, if we're going to nitpick, light has zero rest mass[0], but does have mass while in motion. This is how solar sails can work, since they use the momentum from the photons.

[0] https://en.m.wikipedia.org/wiki/Invariant_mass

kromem

0 replies

1d9h

2024-03-12 08:45:06 UTC

This was written in 50 BCE, nearly two thousand years before Einstein's Nobel winning work proving the discrete qualities of photons.

I'm well aware it's at best a partial description of light.

But it's leagues ahead of Plato's tiny triangles of fire in Timaeus or any other contemporary descriptions.

Also, technically zero mass is very little weight (the least, in fact). And the speed of light is very fast (the fastest). So Lucretius was correct in his statements, if just conservative in the degree to which he stated them (which was in line with the Epicurean commitment to the avoidance of false negatives).

Wave particle duality doesn't really get discussed in Western antiquity outside of a single tangent describing the beliefs of the Peratae who claim the universe has a threefold nature, with the first being continuous and infinitely divisible, the second being a near infinite number of potentialities, and the third being a formal instance. There's a bit of an Everettian quality to their thinking, but outside of its quite broad scope of thought I'm unaware of anyone saying "yeah, reality is both continuous and discrete at the same time" until physicists grappling with contradictory experimental results in the 20th century. The closest in antiquity outside of this group was arguably Plato's theory of forms where the forms were continuous and their physical manifestations discrete, though this is materially different from the idea they are both simultaneously occurring in what's around us (even if Plato's paradigm most likely influenced the much later Peratae).

lukas099

1 replies

1d5h

2024-03-12 12:50:10 UTC

I'm having trouble making the connection between Simpson's paradox and the Epicureans. Can you help me out?

kromem

0 replies

9h7m

2024-03-13 09:17:39 UTC

Ellenberg says the way to avoid falling into Simpson's paradox is to keep multiple views of the data in mind when doing analyses.

Let's say you were in ancient Greece, and you separately observe a drummer on a hill bang a drum before you hear it.

Then on another day you see lightning before you hear thunder.

If you consider each event on its own, a perfectly logical explanation is that there's something unique to drums that slows down the sound from them so it takes longer to reach you, and that lighting and thunder occur at different points in time.

But if you consider the set of both events together, a hypothesis that solves both at the same time is that things you hear take longer to reach you over long distances than things you see.

This was actually one of the examples directly from Lucretius, who in discussing the multiple hypotheses for why lightning and thunder occur at different times tied his suggestion that they occur at the same time but have different travel speeds to his observations of drummers on hills.

It's less specifically Simpson's paradox and more the general value of Ellenberg's analytical advice on avoiding the Simpson's paradox as having been at the root of the success (in hindsight) of one of the wiser philosophy schools in antiquity.

biomcgary

1 replies

20h5m

2024-03-12 22:20:00 UTC

Interesting list of Epicurean theories. Any good starting point that addresses them together and how multiple possibility thinking is related?

kromem

0 replies

9h17m

2024-03-13 09:07:32 UTC

I can't recommend enough straight up reading Lucretius'sNature of Things.

But one of the examples in there of how their methodology ends up successful is when he's discussing the possible reasons lighting and thunder occur at different times.

One possibility thrown out is that they are actually occurring at different times. But another is that they occur at the same time but one takes longer to reach the viewer than the other.

On its own, these two ideas don't indicate the correct answer.

But then Lucretius ties the latter to another observation - that this seems similar to how a drummer in the distance can be seen to beat the drums before you would hear the drums.

Essentially in an age without the methodology of testable predictions, they circumvented that shortcoming by considering multiple hypotheses for multiple naturally occurring observations and looking for overlaps between them.

This seems to have pointed them in the correct direction on a number of major topics, especially relative to their contemporaries who were generally arguing for a particular hypothesis with various appeals to rhetoric or principle (like Aristotle claiming the leader of a bee hive couldn't be female because it had a stinger and "the gods don't give women weapons").

The times the Epicureans completely miss the mark is generally when they disregarded their principle of avoiding false negatives and discounted things with insufficient observational evidence (for example, they had pretty bad cosmology and they rejected the Stoic pre-gravity due to their incorrect base assumption of infinite amounts of matter). The times they kept an open mind and considered how concepts overlapped, even when they were wrong about the 'why' of an initial assumption they were often correct in secondary assumptions when tying it into multiple other systems and observations.

edanm

0 replies

3h45m

2024-03-13 14:39:29 UTC

that in order for free will to exist the quanta making up matter had to have multiple possible results under the same governing physical laws and conditions

That seems like a strange and out-of-place statement, unless I'm misunderstanding it.

I assume this is talking about Quantum mechanics, but I don't think this represents Quantum mechanics or free will correctly, and I doubt the Epicureans knew anything about QM at all.

roenxi

14 replies

1d17h

2024-03-12 00:46:30 UTC

https://en.wikipedia.org/wiki/Berkson%27s_paradox is also one to be aware of. There are lots of ways for error to creep in when populations are created in a biased way.

These two effects explain a lot of the stupid decisions that come out of "data driven" processes. It is common for data to suggest the opposite of the truth.

pdonis

9 replies

1d17h

2024-03-12 01:15:46 UTC

> It is common for data to suggest the opposite of the truth.

Actually, I think the best takeaway from phenomena like these is that just doing statistics on a set of data can't tell you "the truth". If you don't understand the actual causal factors in play, your knowledge is very limited, no matter how much data you have or how many different ways you slice the statistics.

For example, in the UC Berkeley case described in the Simpson's Paradox article, the data actually doesn't tell you anything useful about "bias" in the sense of "something people are doing that they should do differently to make the admissions process fairer". It doesn't even tell you where to look for possible "bias" without knowing more about the admissions process: it is controlled primarily by departments or by the university as a whole?

staunton

6 replies

1d10h

2024-03-12 07:59:47 UTC

just doing statistics on a set of data can't tell you "the truth". If you don't understand the actual causal factors in play, your knowledge is very limited

I would argue that ultimately, all your knowledge and understanding comes from "doing statistics on data". Maybe the statistics is done by sloppy slurpy things in the brain instead of in R, and maybe it's actually mathematically unsound most of the time, but it's still some sort of statistics.

yccs27

5 replies

1d8h

2024-03-12 09:40:32 UTC

I think the key difference is between statistics on passively collected data vs results from active experiments. The former will only ever show correlations, while the latter can prove causal results from the actions of the experimenter.

pdonis

4 replies

1d2h

2024-03-12 15:52:14 UTC

Also, results from active experiments aren't limited to statistics. You can set up experiments to have discrete results, where no statistics is required to test a hypothesis.

For example, the GHZ experiment [1] can rule out local hidden variable models and confirm QM predictions with no statistics at all: the two different models make contradictory predictions with no continuous variation between them.

[1] https://en.wikipedia.org/wiki/GHZ_experiment

staunton

3 replies

1d1h

2024-03-12 17:19:43 UTC

If you read a good experimental paper on GHZ, you will find quite a bit of statistics.

pdonis

2 replies

22h53m

2024-03-12 19:31:58 UTC

Sure, but that doesn't contradict what I said. From the Wikipedia article I referenced:

"For specific combinations of orientations, perfect (rather than statistical) correlations between the three polarizations are predicted by both local hidden variable theory (aka "local realism") and by quantum mechanical theory, and the predictions may be contradictory."

"Perfect" correlations means, as the parenthetical comment shows, "doesn't require statistics to check".

staunton

1 replies

20h42m

2024-03-12 21:42:59 UTC

You said

the GHZ experiment [1] can rule out local hidden variable models and confirm QM predictions with no statistics at all

However, one needs to use statistics to even show GHZ works. That does sound contradictory to me. The correlations you get in experiments are never perfect and in this case they can be pretty far from perfect.

pdonis

0 replies

17h40m

2024-03-13 00:45:06 UTC

> one needs to use statistics to even show GHZ works

Not for the particular cases described in the quote I gave. For a complete verification of all the GHZ theorem's predictions, yes, you need to do statistics, because some of those predictions are probabilistic.

> The correlations you get in experiments are never perfect

In some cases, like the ones described in the quote I gave, it isn't a matter of correlations. You have contradictory results predicted by two different models, each prediction being 100% certain according to the model. You don't need any statistics to test that: just do one single run and see which way it comes out.

andirk

1 replies

1d2h

2024-03-12 15:28:14 UTC

Is the UC Berkeley case a good example of the importance of normalizing data before analyzing? Where things need to be put on a level playing field and handicaps applied to remove auxiliary noise.

pdonis

0 replies

1d2h

2024-03-12 15:46:22 UTC

Normalizing data doesn't fix the issue in the UC Berkeley case, because you still have to pick what to normalize over: do you normalize over the entire university, or separately over each department?

The answer to questions like that can't be found in the data. You have to go look at how the university admission process actually works, and what roles the university vs. the individual departments play in it.

o11c

2 replies

1d15h

2024-03-12 02:32:33 UTC

Of course, an empiricist can just point to https://en.wikipedia.org/wiki/Goodhart%27s_law

tunesmith

1 replies

1d14h

2024-03-12 03:26:56 UTC

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

It may be true in some cases, but I think the "Any" is disproven by the existence of proper scoring rules.

o11c

0 replies

1d14h

2024-03-12 03:39:42 UTC

It's a very big assumption that such rules can, let alone do, exist in the first place.

JacobAldridge

0 replies

1d6h

2024-03-12 11:48:40 UTC

As the aphorism goes, there are Lies, Damned Lies, and Statistics.

estiaan

4 replies

1d13h

2024-03-12 04:53:50 UTC

I was reading the example of UC Berkely appearing to have gender bias in the admissions and read the following:

“it showed that women tended to apply to more competitive departments with lower rates of admission, even among qualified applicants (such as in the English department), whereas men tended to apply to less competitive departments with higher rates of admission (such as in the engineering department)”

That’s the opposite of what I would expect, I’d expect that English and the arts in general would be a lot easier to get into than stem, that’s how it is in Australia

Edit: When I say get into I mean get into university, not getting into the industry

tacitusarc

0 replies

1d12h

2024-03-12 05:38:39 UTC

Competitiveness is one measure of difficulty, but there are others. Engineering departments tend to be qualitatively more difficult to get into than the humanities, which are quantitatively more difficult.

aldonius

0 replies

1d8h

2024-03-12 10:09:00 UTC

I think it could be consistent.

In Australia we usually think about undergrad competitiveness in terms of the minimum ATAR rank of last years' admissions, right?

But you could also think about competitiveness in terms of admission fraction.

And I think those two metrics can be consistent if a very popular degree has a disproportionately low fraction of high-ATAR applications.

DonsDiscountGas

0 replies

1d7h

2024-03-12 11:22:08 UTC

I was surprised to read that too. I think the answer is that here we're looking at admissions rate = number admitted / number applied, which is not the same as overall difficulty in a conceptual sense.

The only people applying to grad school in math are people who got a BS in math and did so with good grades (or perhaps some other STEM field + significant theoretical math coursework). On the arts side I suspect they draw from a larger pool (plus people tend to switch from STEM to something else a lot more than the other way around) of backgrounds. It's easier to convince oneself that a short story is great (when others may disagree) than convincing oneself a math proof is correct when it objectively is not. So there's less self-selection on the applicant side, and hence a lower admissions rate.

BeetleB

0 replies

1d3h

2024-03-12 15:07:02 UTC

Since no one gave the obvious[1] answer:

The data is for application to graduate programs. There is a ton more funding for engineering, and many/most students going for a PhD in engineering don't pay for it. There's very little funding in the humanities, and most students are not willing to pay high costs for a PhD in the humanities, so the department tightly restricts admission.

As a result, it's easier to get into an engineering PhD program - as long as you are competent enough.

I had a friend who was a fellow engineering student. He became disillusioned and wanted to go into journalism. He applied to transfer to the Communications program at the university and told me how competitive it was - they admit less than 10 people per year. He did not get in.

[1] Obvious if you've spent a lot of time in grad school.

CrazyStat

4 replies

1d14h

2024-03-12 03:37:39 UTC

When I taught intro stats many years ago I used to use house prices as a nice example of Simpson's Paradox (with actual data, for the students to investigate as part of a computational lab). The data I had was on US house sales from 2008, so it's 15 years out of date now--perhaps things have changed since.

At the time, the average price for single-family house sales was higher for houses without central AC than for houses with central AC. Yet when you split the data down by state, in every state the relationship was reversed: houses with central AC were more expensive than houses without.

The higher nationwide average price of houses without central AC was driven primarily by the large number of expensive houses in California without central AC.

bruce511

2 replies

1d11h

2024-03-12 07:13:07 UTC

Im reading this outcome to be the reverse of the examples above. Or perhaps identifying the correct stat to use based on your goals.

In other words, in this case I don't really care what the national average is. I care about my house, my street, my area.

In other cases, like in marketing, the stat that matters first is overall net profit. From there we can burrow down to understand the factors. In which case we come across business share before marketing spend.

In the networking example, the goal is usage (throughput). Not speed or latency.

Drawing the wrong stat first leads to incorrect conclusions.

Gibbon1

0 replies

1d10h

2024-03-12 08:15:23 UTC

like in marketing, the stat that matters first is overall net profit.

I have a take on that. The stat that matters is the profit per unit of non scalable business resource. As in how much management, marketing, sales, accounting, and engineering time does the product take per unit. It's important because those are often hard to scale. You can have a low margin product that requires zip of the above and it's good business. And the reverse, high margins but requires too much of the above and it's bad.

CrazyStat

0 replies

1d5h

2024-03-12 12:58:05 UTC

Right, one of the interesting things about Simpson's paradox is that there's not a uniform right answer: sometimes you care about the overall average, sometimes you care about the averages of subpopulations. You have to judge that based on the situation.

One of the other comments linked [1] which includes Judea Pearl's analysis of Simpson's paradox from a causal inference point of view [2], which lays this out nicely (though maybe not easy to understand--it took me many hours of study to get comfortable with Pearl's causal inference work, even with a strong stats background).

[1] https://plato.stanford.edu/entries/paradox-simpson/

[2] https://plato.stanford.edu/entries/paradox-simpson/#ConfPear...

gofreddygo

0 replies

3h9m

2024-03-13 15:15:22 UTC

Ahh.. I've seen this pattern before, skim-read the wikipedia article and yet it didn't click.

Your comment made it all fit together. May not be the most accurate (I dont know) but it helped me. Thank you.

If you have any written material I can access, from your courses, I'd be interested to read.

Thanks for this.

marklubi

3 replies

1d14h

2024-03-12 04:09:55 UTC

What is the deal with posting (seemingly) random links to Wikipedia articles lately without context?

Some of them are interesting, but most(?) of them come without details.

Please provide me with some context.

lukas099

2 replies

1d5h

2024-03-12 12:57:17 UTC

Things hackers would find interesting. I think they have always been posted as long as I've been here.

marklubi

1 replies

1d3h

2024-03-12 14:34:12 UTC

There is no context provided with any of these links. I'm not making any claims about whether or not they have interesting content.

There's been a huge uptick in submissions of random Wikipedia articles where there is no context provided with them.

There are maybe two or three in the last day that have anything beyond the title included. https://news.ycombinator.com/from?site=wikipedia.org

BeetleB

0 replies

1d3h

2024-03-12 15:08:19 UTC

There is no context provided with any of these links.

Strange comment. There doesn't need to be.

bee_rider

3 replies

1d17h

2024-03-12 00:58:26 UTC

For all of the examples on Wikipedia, it seems like there was some confounding extra variable that was missed. I wonder if anybody knows of a case where it just sort of happened randomly, with no big underlying cause?

Or maybe I’m thinking of it wrong and this is impossible.

eadler

0 replies

1d17h

2024-03-12 01:03:55 UTC

It can happen any time there is a mix shift in the underlying quantity of the subgroups. It's just that random changes in quantities are not likely to be studied or reported. It's easy to generate manually though.

Kalium

0 replies

1d5h

2024-03-12 13:13:00 UTC

The classic example of admissions does not have a missing confounding variable. It's a case of aggregation at the wrong level.

DonsDiscountGas

0 replies

1d6h

2024-03-12 11:31:45 UTC

In order for it to count as Simpsons paradox I think there would need to be a confounding variable. It's certainly possible for it to appear spuriously, and for something to look like a confounder when it isn't, but there would need to be some type of subgroup.

nomilk

2 replies

1d9h

2024-03-12 08:34:47 UTC

The short animation on the wiki page is a great example of a picture being worth 1000 words:

https://en.wikipedia.org/wiki/File:Simpsons_paradox_-_animat...

freedomben

1 replies

1d2h

2024-03-12 15:35:56 UTC

Hot damn, you're not kidding. I was struggling a bit with it from reading the article text, but that animation clarified it for me in seconds!

thefringthing

0 replies

23h46m

2024-03-12 18:38:46 UTC

Here it is in one static image: https://i.imgur.com/qJsTjpp.jpeg

kfarr

2 replies

1d14h

2024-03-12 03:50:15 UTC

Upon first glance I assumed it was the Simpson's episode about Mr. Burns having "a vast range of diseases so great in fact that they cancel each other out," explained here: https://simpsons.fandom.com/wiki/Three_Stooges_Syndrome

Surprised at the similarity but of course that was probably on purpose by the genius Simpsons writers of the late 90's.

xcdzvyn

0 replies

1d4h

2024-03-12 13:47:57 UTC

I assumed it'd be something about the number of recent significant events that the Simpsons "predicted"[0].

[0] https://www.independent.co.uk/arts-entertainment/tv/news/the...

1970-01-01

0 replies

1d14h

2024-03-12 04:04:56 UTC

"A man can have as many diseases as he damn well pleases."

https://en.wikipedia.org/wiki/Hickam's_dictum

waldrews

0 replies

1d16h

2024-03-12 01:37:10 UTC

Previous discussion: https://news.ycombinator.com/item?id=791821

pcwelder

0 replies

1d10h

2024-03-12 07:43:31 UTC

Encountered it recently. I had two different dataset to evaluate model performance on from different domains.

One dataset was closer to training data and the other was closer to our business use case. The hypothesis was that performance on the latter dataset would be poorer due to overfitting.

Indeed the accuracy on all categories had reduced. However, overall accuracy was much higher!

This was because the second dataset had higher frequency of easy to predict categories.

If we had just looked at overall number we would have concluded that there was no overfitting to train domain, which was not the case.

paradocks

0 replies

1d14h

2024-03-12 03:39:02 UTC

Lord's paradox is closely related to Simpson's, and I find it's a little easier to understand visually: https://repository-images.githubusercontent.com/597130499/46...

Imagine the horizontal axis is dose of a drug, and the vertical is the response, like hours of sleep. Looking at Lisa's response, it's clear that increasing the dose reduces sleep. Same for Bart. But if you do a linear regression of all the data, shown by the red line, dose increases sleep, which is wrong.

ninetyninenine

0 replies

1d13h

2024-03-12 05:17:43 UTC

That visualization was so effective I didn't need to read the wiki at all. I get it just from that.

esafak

0 replies

1d16h

2024-03-12 01:44:48 UTC

I did not know Simpson's paradox was an object lesson in causal inference until the other day. The right paradigm dispels the paradox. Here's a better article: https://plato.stanford.edu/entries/paradox-simpson/

caditinpiscinam

0 replies

1d14h

2024-03-12 04:01:20 UTC

My gripe with most takes on the affirmative action debate is that they completely ignore this issue (in spite of the Berkeley case-study). It's trivial to take a set of admissions data and partition it by race to reveal a "racial bias", which would shrink or disappear if other factors correlated with race (like income) were accounted for.

LinAGKar

0 replies

1d8h

2024-03-12 09:34:11 UTC

That second graph reminds me of Shepard tones [1], known e.g. from the Super Mario 64 staircase, where each component is steadily rising in pitch, yet the tone as a whole stays exactly the same in the long term.

[1] https://en.wikipedia.org/wiki/Shepard_tone

EGreg

0 replies

1d9h

2024-03-12 08:28:58 UTC

Does this also explain the gender pay gap? Or is it only for trends?