return to table of content

Simpson's paradox

r_thambapillai
31 replies
1d17h

I once encountered this in the real world as a data analyst a long time ago. I was working at an e-commerce company, called The Hut Group, and the whole year our marketing team had been saying our marketing cost of goods sold (the percentage of our revenue we needed to spend on marketing) had been declining across every product category. But at year end, the execs were shocked to realize that our cost of goods sold had almost doubled, from 10% to nearly 20%.

The finance team had asked me to double check the marketing team's numbers, to see if there'd been some funny math in the reporting. But the marketing team were totally right, marketing spend across the three main categories - games, beauty, and nutrition had all fallen (~15% to ~10%, ~30% to ~25%, and ~50% to ~30% respectively). However, the mix of these product categories had shifted massively, with nutrition growing from roughly 10% of our total sales to now nearly 50%.

In net that meant that whilst the marketing team had gotten more cost-efficient at selling every individual product category, the growth in the nutrition industry had vastly outstripped the growth in all other categories, and since that was the highest individual category, the aggregate marketing costs % had gone up, even though the team had improved every category. I then had the fun job of explaining the Yule Simpson paradox to a bunch of accountants.

jldugger
20 replies
1d13h

Pretty much every dataset I work with as an SRE is full of these paradoxes. One classic published example comes from Google:

A network engineer took a trip to Indonesia or something (can't find the citation to confirm the exact tale), noticed the service was slow, and when asking around everyone said "that's how its always been." Basically the local cellular networks are slow and off island fiber connects are saturated. Back at the office they decide to attack the problem by optimizing payload sizes. Does the work, reducing download sizes by half, and ships it. Latency metrics? Average and p95 latency actually increased after shipping the work to production.

How does an objectively good change make things worse? Well, the service had improved for those customers so much that they used it a lot more. Even with the lighter demand on bandwidth the network latency to the datacenter was worse than typical US customers, so as more of these people realized the service sucked way less, they used it more and drove the numbers up.

I have tons of these examples where a data team looks at a particular slice of request telemetry, and comes to a wrong conclusion because they didn't model enough of the system, or controlled for the wrong (or too many) variables. The worst ones the cyclic finger pointing situations that Simpson's paradox can produce: App developers blaming a regression on the server side component while the server team blames the app team, often because the server and app release schedules accidentally aligned too well. In this case we have canary data to exonerate our side of the equation, but sometimes the problem lies in even deeper spaces, like app updates from an entirely different app.

Tomte
11 replies
1d12h

But your example isn‘t a case of Simpson‘s Paradox (which is purely statistical), but Jevons Paradox (which is about human behaviour and economics).

ploxiln
7 replies
1d11h

If I recall the youtube slow-internet optimisation case correction, I think it is an example of Simpson's paradox. They made it faster for countries with fast internet, and faster for countries with slow internet, and then the average performance across all users/countries was slower, because now the countries with slow internet used youtube much more than before.

lordgrenville
6 replies
1d10h

But the improvement induced the demand, which to my mind makes this different from Simpson's Paradox.

lern_too_spel
3 replies
1d9h

Doesn't matter. That is not relevant to the paradox.

kgwgk
2 replies
1d7h

How does "Average and p95 latency actually increased after shipping the work to production. How does an objectively good change make things worse?" relate to Simpson's paradox again?

ploxiln
1 replies
1d6h

That's exactly it. After "shipping the work to production" (making it faster for everybody), the overall average and p95 got worse. Each sub-population experienced improvement: countries with fast internet got faster youtube, countries with slow internet got faster youtube. But the overall average and p95 got worse: overall average was slower youtube. Because now more users from the second sub-population bring the overall average speed down (or latency up). That's Simpson's paradox.

kgwgk
0 replies
1d5h

Ah, you may be right. It's not clear in the story that "Average and p95 latency actually increased after shipping the work to production." means average of Indonesia and ex-Indonesia and not just Indonesian average.

bryanrasmussen
1 replies
1d9h

I would say the improvement allowed the demand to be met, everybody wanted to use youtube, but few could.

Just like many people may want to eat a wide range of expensive tasty food, but have to make do with junk because it's what they can afford.

mFixman
0 replies
1d6h

It would be Simpsons' Paradox if Google services in Indonesia were initially slow because Indonesians tend to use YouTube more often than lighter services.

There wasn't an error in the conclusions of the initial measuremen. It was the solution that had problems.

jldugger
1 replies
1d11h

Good point! I'm just a humble Linux sysadmin dubbed "SRE" who slept through Stats for Engineers and now pays the price every week dealing with SWE eager to blame me for their mistakes.

roenxi
0 replies
19h34m

You were right; that was a case of Simpson's paradox. Every category experienced a latency boost but the overall statistic worsened. Jevon's paradox is what caused the induced demand, but when the new usage data was gathered the initial review was an example of Simpson's paradox.

Effect of the change -> Jevon's paradox.

Measurement of Jevon's paradox -> Simpson's paradox (in this case, that isn't a general rule).

The fact that the two are easily linked is one of the reasons the statistical paradox is so common in practice.

lern_too_spel
0 replies
1d10h

Latency improved for everyone, but overall average latency increased because usage increased faster in high latency areas. That's Simpson's Paradox. Simpson's Paradox doesn't care where the subpopulations you're measuring came from.

tetris11
5 replies
1d9h

isn't that the "One More Lane, I Promise!" meme

wastewastewaste
4 replies
1d6h

It is, but usually the meme misrepresents induced demand. While I don't like cars and we should focus on other infrastructure, adding a lane does help.

It does not reduce congestion, but it does now serve more people at this same current congestion level. And those people have come from somewhere. Sometimes from public transport, which isn't really good, but sometimes from some backwater road.

vidarh
2 replies
1d5h

The bigger problem with induced demand is that it's often poor ROI to add that lane where the demand is highest.

That is, imagine you have a big city. You can add capacity for 1m extra people to travel to the city centre, where there's lots of congestion. Or you find ways to induce demand around the other limits of town, even town current demand is low there.

Odds are you'll pick the first, because it's "obvious" and doesn't require much thinking to see it'd help. But we really ought to look at cost-benefit of the second option too, because repeatedly inducing demand in the centre keeps driving up the incremental cost of further improvements, along plenty of other undesirable second order effects.

otherme123
1 replies
1d3h

Adding lanes is like getting a bigger cache with the same throughput.

It's obvious at the supermarket: what goes faster, a single cashier processing four short lanes of 10 people with round robin, or two cashiers processing a single lane with 40 people?

Is the city center able to process 1m extra people? If not, it doesn't matter how many lanes you build.

vidarh
0 replies
1d2h

Well you often can make it able to "process" 1m extra people: You can build overpasses, and tunnels, and taller buildings. But the cost-per-extra-person will tend to go up accordingly, to the point where you could spend an extraordinary amount attracting people out of the centre.

E.g. London's "Crossrail" / Elizabeth line cost $24 billion. Granted, it also allows some people to go through London faster, but I can't help to wonder what that money could've done if applied to attract businesses out of the centre instead. E.g. upgrading links between towns on the outskirts, upgrading town centres, and generally try to make it more attractive for businesses to be located further out.

Given the extraordinary costs it takes to do large infrastructure projects in London, I'd be very surprised if you couldn't get a higher return on investment that way, or by investing similar sums elsewhere in the UK entirely.

lmz
0 replies
1d6h

Until more people choose to live further away because the commute is now tolerable with the extra lane (and it's cheaper), and then you're back to square one.

clemiclemen
1 replies
1d1h

This reminds me of a similar story with YouTube [1] where improving the page weight decreased the metrics because more people with lower end connections could access the page.

Metrics interpretation is as important as the metrics themselves!

[1]: https://blog.chriszacharias.com/page-weight-matters

jldugger
0 replies
1d

That may be exactly the story I was thinking of, or perhaps the original of a story I encountered on a GCP cloud post or something.

konstantinua00
4 replies
1d12h

every time I hear about examples of simpson in peactice, I don't get what lesson to learn

marketting team overoptimized, so non-nutrition demand fell?

drop nutrition from line of products, so that you're both efficient in products you do and overall?

these metrics are insufficient and it's better to look at gross change rather than ratios?

I have no idea

thih9
0 replies
1d12h

I think the last one is closest. I’d go with: “finance team should look at the gross change”, if that’s what matters for them.

infogulch
0 replies
1d10h

Maybe the lesson is to analyze different business units (product categories?) independently first, then the whole.

gen220
0 replies
1d1h

IME, the "problem" (to the extent there is one) is almost always that the naïvely-chosen KPI metric wasn't specific enough.

Here's a recent example from a friend. You're a SaaS company, and your home page's load time is reported as slow. You set your KPI for the quarter to be "reduce p99 load time of the home page by 50%".

The load time is a function of customer size, so bigger customers = slower home page. It's actually a quadratic function. So the p99 of small customers is like the p50 of large customers. You have 20 small customers and 20 big customers.

That quarter, the sales team onboards 10 new tiny customers, and 10 big customers churn. It's the holiday season in your big customers' geo, so mostly small customers are using the platform. It's the busiest time of year for the small customers, so they're over-using the platform.

All these factors lead to p99 latency dropping by 60%, smashing the KPI goal. Bonuses all around, pats on the back. And no code changes needed, besides!

The solution is: choose a KPI that is tightly coupled to your problem, and not confounded with other variables.

In the above case, a better KPI would have been "p99 latency for large customers", because it is robust to the distribution of customer sizes across current users, churned users, and seasonal differences in usage.

brabel
0 replies
1d11h

The article suggests an answer to your question, see the last sentence of the introduction:

"its lesson "isn't really to tell us which viewpoint to take but to insist that we keep both the parts and the whole in mind at once."

In the case above, they failed at "keeping the parts in mind" as clearly, the different ratios between different products was crucial.

appplication
1 replies
1d12h

Covid vaccination rates and deaths were rather famously subject to it. E.g. some combination of stats like “most covid deaths were vaccinated individuals”, “vaccination reduces death rate”, and “population segment with lowest vaccination rates has lowest covid death rates.” were all true at the same time.

onychomys
0 replies
1d5h

Those aren't examples of Simpsons even taken together, but there was a famous (by which I mean it got a lot of press, including being written up in the Times and Post when it came out) study that showed that although every subgroup in Italian demographic data had lower CFRs than their Chinese counterparts, the Chinese group had a lower CFR when taken as a whole:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8791436/

throwaway98797
0 replies
1d17h

it’s shocking that product mix wasn’t slide on reporting

but marketing selects for positivity not objectivity

the facts and only the facts that support what they do

kromem
21 replies
1d15h

I absolutely love the Ellenberg quote:

Mathematician Jordan Ellenberg argues that Simpson's paradox is misnamed as "there's no contradiction involved, just two different ways to think about the same data" and suggests that its lesson "isn't really to tell us which viewpoint to take but to insist that we keep both the parts and the whole in mind at once."

Keeping multiple possibilities in mind at once was what allowed the Epicureans to determine survival of the fittest, trait inheritance from each parent, that light was made of discrete units that weighed very little and were moving very fast, and that in order for free will to exist the quanta making up matter had to have multiple possible results under the same governing physical laws and conditions - all several millennia before the scientific method independently found the same results.

It's a great analytical method, especially in data analysis as suggested here.

saghm
8 replies
1d5h

Mathematician Jordan Ellenberg argues that Simpson's paradox is misnamed as "there's no contradiction involved, just two different ways to think about the same data"

Isn't this a bit of a misunderstanding on their part on the meaning of the word "paradox"? The fact that they're called paradoxes is that they go against initial intuition and _seem_ contradictory, not that they necessarily are. If anything, I'd guess that most of the named paradoxes turn out to not actually be contradictory because when something seems incorrect and actually is incorrect, it's a lot less likely to be interesting enough to give a name.

klodolph
7 replies
1d2h

There are various categories of paradoxes and different ways people have categorized them.

Quine calls this one a “veridical paradox”, where it seems false but is true.

Example of a different types of paradox are: any proof that 1=0, Russell’s Paradox, and Zeno’s paradox. These are either false in some sense or used to illustrate fallacious reasoning.

missingrib
3 replies
1d2h

I don't think Zeno's paradoxes have truly been proven false.

shawabawa3
2 replies
1d2h

I don't think Zeno's paradoxes have truly been proven false.

Are you suggesting that there's a chance motion doesn't exist?

klodolph
1 replies
1d1h

It also has not been proven that real, correct proofs for 1=0 do not exist. Paradoxes are not all about proofs.

edanm
0 replies
3h47m

I'm fairly sure it has actually, under some axiom schemas.

jldugger
2 replies
1d2h

“veridical paradox”, where it seems false but is true.

any proof that 1=0

So, 1=0 seems false but is true?

klodolph
1 replies
1d1h

1=0 proofs are examples of a different type of paradox. It is an example of a paradox that does not seem true.

jldugger
0 replies
1d1h

Ah, reading comprehension is hard. I thought you were listing different examples of the same paradox for some reason. Carry on.

brabel
6 replies
1d10h

Sorry to nitpick, but "light was made of discrete units that weighed very little and were moving very fast" is not really correct.

First of all, light has exactly zero weight (only a massless particle can travel at exactly the speed of light, and at no other speed for that matter).

Secondly, you're leaving out the wave/particle duality of light, which sort of reminds the Simpson's paradox description of "just two different ways to think about the same data", without which you simply can't fully understand the behaviour of light (or of the statistical system you're looking at).

lupusreal
3 replies
1d8h

Weight isn't mass; weight is the force acting on something due to gravity. Gravity effects light, albeit only by a little, so in this sense light has a small but nonzero weight.

brabel
2 replies
1d7h

I don't believe that's a correct interpretation. The reason light bends in the presence of gravity is that space time itself is curved, and light follows a "straight line" on that curved space time.

Given weight is defined as `W=mg`, and `m` is `0` for light, light can't have any weight. I think the question is itself incorrect: you can't weigh light because light is not something you can "stop" and put on a balance.

The fact that gravity appears to "attract" light is an illusion. Light only has what is called "relativistic mass" which has very little to do with how we normally think of mass and weight.

This article explains it pretty well: https://science.howstuffworks.com/light-weigh.htm

topaz0
0 replies
1d5h

the reason light bends in the presence of gravity ...

This is also why gravity bends the trajectories of massive particles, which also follow geodesics of the curved spacetime (in the absence of other forces).

im3w1l
0 replies
1d6h

Light bends spacetime too.

noam_k
0 replies
1d2h

Well, if we're going to nitpick, light has zero rest mass[0], but does have mass while in motion. This is how solar sails can work, since they use the momentum from the photons.

[0] https://en.m.wikipedia.org/wiki/Invariant_mass

kromem
0 replies
1d9h

This was written in 50 BCE, nearly two thousand years before Einstein's Nobel winning work proving the discrete qualities of photons.

I'm well aware it's at best a partial description of light.

But it's leagues ahead of Plato's tiny triangles of fire in Timaeus or any other contemporary descriptions.

Also, technically zero mass is very little weight (the least, in fact). And the speed of light is very fast (the fastest). So Lucretius was correct in his statements, if just conservative in the degree to which he stated them (which was in line with the Epicurean commitment to the avoidance of false negatives).

Wave particle duality doesn't really get discussed in Western antiquity outside of a single tangent describing the beliefs of the Peratae who claim the universe has a threefold nature, with the first being continuous and infinitely divisible, the second being a near infinite number of potentialities, and the third being a formal instance. There's a bit of an Everettian quality to their thinking, but outside of its quite broad scope of thought I'm unaware of anyone saying "yeah, reality is both continuous and discrete at the same time" until physicists grappling with contradictory experimental results in the 20th century. The closest in antiquity outside of this group was arguably Plato's theory of forms where the forms were continuous and their physical manifestations discrete, though this is materially different from the idea they are both simultaneously occurring in what's around us (even if Plato's paradigm most likely influenced the much later Peratae).

lukas099
1 replies
1d5h

I'm having trouble making the connection between Simpson's paradox and the Epicureans. Can you help me out?

kromem
0 replies
9h7m

Ellenberg says the way to avoid falling into Simpson's paradox is to keep multiple views of the data in mind when doing analyses.

Let's say you were in ancient Greece, and you separately observe a drummer on a hill bang a drum before you hear it.

Then on another day you see lightning before you hear thunder.

If you consider each event on its own, a perfectly logical explanation is that there's something unique to drums that slows down the sound from them so it takes longer to reach you, and that lighting and thunder occur at different points in time.

But if you consider the set of both events together, a hypothesis that solves both at the same time is that things you hear take longer to reach you over long distances than things you see.

This was actually one of the examples directly from Lucretius, who in discussing the multiple hypotheses for why lightning and thunder occur at different times tied his suggestion that they occur at the same time but have different travel speeds to his observations of drummers on hills.

It's less specifically Simpson's paradox and more the general value of Ellenberg's analytical advice on avoiding the Simpson's paradox as having been at the root of the success (in hindsight) of one of the wiser philosophy schools in antiquity.

biomcgary
1 replies
20h5m

Interesting list of Epicurean theories. Any good starting point that addresses them together and how multiple possibility thinking is related?

kromem
0 replies
9h17m

I can't recommend enough straight up reading Lucretius'sNature of Things.

But one of the examples in there of how their methodology ends up successful is when he's discussing the possible reasons lighting and thunder occur at different times.

One possibility thrown out is that they are actually occurring at different times. But another is that they occur at the same time but one takes longer to reach the viewer than the other.

On its own, these two ideas don't indicate the correct answer.

But then Lucretius ties the latter to another observation - that this seems similar to how a drummer in the distance can be seen to beat the drums before you would hear the drums.

Essentially in an age without the methodology of testable predictions, they circumvented that shortcoming by considering multiple hypotheses for multiple naturally occurring observations and looking for overlaps between them.

This seems to have pointed them in the correct direction on a number of major topics, especially relative to their contemporaries who were generally arguing for a particular hypothesis with various appeals to rhetoric or principle (like Aristotle claiming the leader of a bee hive couldn't be female because it had a stinger and "the gods don't give women weapons").

The times the Epicureans completely miss the mark is generally when they disregarded their principle of avoiding false negatives and discounted things with insufficient observational evidence (for example, they had pretty bad cosmology and they rejected the Stoic pre-gravity due to their incorrect base assumption of infinite amounts of matter). The times they kept an open mind and considered how concepts overlapped, even when they were wrong about the 'why' of an initial assumption they were often correct in secondary assumptions when tying it into multiple other systems and observations.

edanm
0 replies
3h45m

that in order for free will to exist the quanta making up matter had to have multiple possible results under the same governing physical laws and conditions

That seems like a strange and out-of-place statement, unless I'm misunderstanding it.

I assume this is talking about Quantum mechanics, but I don't think this represents Quantum mechanics or free will correctly, and I doubt the Epicureans knew anything about QM at all.

roenxi
14 replies
1d17h

https://en.wikipedia.org/wiki/Berkson%27s_paradox is also one to be aware of. There are lots of ways for error to creep in when populations are created in a biased way.

These two effects explain a lot of the stupid decisions that come out of "data driven" processes. It is common for data to suggest the opposite of the truth.

pdonis
9 replies
1d17h

> It is common for data to suggest the opposite of the truth.

Actually, I think the best takeaway from phenomena like these is that just doing statistics on a set of data can't tell you "the truth". If you don't understand the actual causal factors in play, your knowledge is very limited, no matter how much data you have or how many different ways you slice the statistics.

For example, in the UC Berkeley case described in the Simpson's Paradox article, the data actually doesn't tell you anything useful about "bias" in the sense of "something people are doing that they should do differently to make the admissions process fairer". It doesn't even tell you where to look for possible "bias" without knowing more about the admissions process: it is controlled primarily by departments or by the university as a whole?

staunton
6 replies
1d10h

just doing statistics on a set of data can't tell you "the truth". If you don't understand the actual causal factors in play, your knowledge is very limited

I would argue that ultimately, all your knowledge and understanding comes from "doing statistics on data". Maybe the statistics is done by sloppy slurpy things in the brain instead of in R, and maybe it's actually mathematically unsound most of the time, but it's still some sort of statistics.

yccs27
5 replies
1d8h

I think the key difference is between statistics on passively collected data vs results from active experiments. The former will only ever show correlations, while the latter can prove causal results from the actions of the experimenter.

pdonis
4 replies
1d2h

Also, results from active experiments aren't limited to statistics. You can set up experiments to have discrete results, where no statistics is required to test a hypothesis.

For example, the GHZ experiment [1] can rule out local hidden variable models and confirm QM predictions with no statistics at all: the two different models make contradictory predictions with no continuous variation between them.

[1] https://en.wikipedia.org/wiki/GHZ_experiment

staunton
3 replies
1d1h

If you read a good experimental paper on GHZ, you will find quite a bit of statistics.

pdonis
2 replies
22h53m

Sure, but that doesn't contradict what I said. From the Wikipedia article I referenced:

"For specific combinations of orientations, perfect (rather than statistical) correlations between the three polarizations are predicted by both local hidden variable theory (aka "local realism") and by quantum mechanical theory, and the predictions may be contradictory."

"Perfect" correlations means, as the parenthetical comment shows, "doesn't require statistics to check".

staunton
1 replies
20h42m

You said

the GHZ experiment [1] can rule out local hidden variable models and confirm QM predictions with no statistics at all

However, one needs to use statistics to even show GHZ works. That does sound contradictory to me. The correlations you get in experiments are never perfect and in this case they can be pretty far from perfect.

pdonis
0 replies
17h40m

> one needs to use statistics to even show GHZ works

Not for the particular cases described in the quote I gave. For a complete verification of all the GHZ theorem's predictions, yes, you need to do statistics, because some of those predictions are probabilistic.

> The correlations you get in experiments are never perfect

In some cases, like the ones described in the quote I gave, it isn't a matter of correlations. You have contradictory results predicted by two different models, each prediction being 100% certain according to the model. You don't need any statistics to test that: just do one single run and see which way it comes out.

andirk
1 replies
1d2h

Is the UC Berkeley case a good example of the importance of normalizing data before analyzing? Where things need to be put on a level playing field and handicaps applied to remove auxiliary noise.

pdonis
0 replies
1d2h

Normalizing data doesn't fix the issue in the UC Berkeley case, because you still have to pick what to normalize over: do you normalize over the entire university, or separately over each department?

The answer to questions like that can't be found in the data. You have to go look at how the university admission process actually works, and what roles the university vs. the individual departments play in it.

tunesmith
1 replies
1d14h

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

It may be true in some cases, but I think the "Any" is disproven by the existence of proper scoring rules.

o11c
0 replies
1d14h

It's a very big assumption that such rules can, let alone do, exist in the first place.

JacobAldridge
0 replies
1d6h

As the aphorism goes, there are Lies, Damned Lies, and Statistics.

estiaan
4 replies
1d13h

I was reading the example of UC Berkely appearing to have gender bias in the admissions and read the following:

“it showed that women tended to apply to more competitive departments with lower rates of admission, even among qualified applicants (such as in the English department), whereas men tended to apply to less competitive departments with higher rates of admission (such as in the engineering department)”

That’s the opposite of what I would expect, I’d expect that English and the arts in general would be a lot easier to get into than stem, that’s how it is in Australia

Edit: When I say get into I mean get into university, not getting into the industry

tacitusarc
0 replies
1d12h

Competitiveness is one measure of difficulty, but there are others. Engineering departments tend to be qualitatively more difficult to get into than the humanities, which are quantitatively more difficult.

aldonius
0 replies
1d8h

I think it could be consistent.

In Australia we usually think about undergrad competitiveness in terms of the minimum ATAR rank of last years' admissions, right?

But you could also think about competitiveness in terms of admission fraction.

And I think those two metrics can be consistent if a very popular degree has a disproportionately low fraction of high-ATAR applications.

DonsDiscountGas
0 replies
1d7h

I was surprised to read that too. I think the answer is that here we're looking at admissions rate = number admitted / number applied, which is not the same as overall difficulty in a conceptual sense.

The only people applying to grad school in math are people who got a BS in math and did so with good grades (or perhaps some other STEM field + significant theoretical math coursework). On the arts side I suspect they draw from a larger pool (plus people tend to switch from STEM to something else a lot more than the other way around) of backgrounds. It's easier to convince oneself that a short story is great (when others may disagree) than convincing oneself a math proof is correct when it objectively is not. So there's less self-selection on the applicant side, and hence a lower admissions rate.

BeetleB
0 replies
1d3h

Since no one gave the obvious[1] answer:

The data is for application to graduate programs. There is a ton more funding for engineering, and many/most students going for a PhD in engineering don't pay for it. There's very little funding in the humanities, and most students are not willing to pay high costs for a PhD in the humanities, so the department tightly restricts admission.

As a result, it's easier to get into an engineering PhD program - as long as you are competent enough.

I had a friend who was a fellow engineering student. He became disillusioned and wanted to go into journalism. He applied to transfer to the Communications program at the university and told me how competitive it was - they admit less than 10 people per year. He did not get in.

[1] Obvious if you've spent a lot of time in grad school.

CrazyStat
4 replies
1d14h

When I taught intro stats many years ago I used to use house prices as a nice example of Simpson's Paradox (with actual data, for the students to investigate as part of a computational lab). The data I had was on US house sales from 2008, so it's 15 years out of date now--perhaps things have changed since.

At the time, the average price for single-family house sales was higher for houses without central AC than for houses with central AC. Yet when you split the data down by state, in every state the relationship was reversed: houses with central AC were more expensive than houses without.

The higher nationwide average price of houses without central AC was driven primarily by the large number of expensive houses in California without central AC.

bruce511
2 replies
1d11h

Im reading this outcome to be the reverse of the examples above. Or perhaps identifying the correct stat to use based on your goals.

In other words, in this case I don't really care what the national average is. I care about my house, my street, my area.

In other cases, like in marketing, the stat that matters first is overall net profit. From there we can burrow down to understand the factors. In which case we come across business share before marketing spend.

In the networking example, the goal is usage (throughput). Not speed or latency.

Drawing the wrong stat first leads to incorrect conclusions.

Gibbon1
0 replies
1d10h

like in marketing, the stat that matters first is overall net profit.

I have a take on that. The stat that matters is the profit per unit of non scalable business resource. As in how much management, marketing, sales, accounting, and engineering time does the product take per unit. It's important because those are often hard to scale. You can have a low margin product that requires zip of the above and it's good business. And the reverse, high margins but requires too much of the above and it's bad.

CrazyStat
0 replies
1d5h

Right, one of the interesting things about Simpson's paradox is that there's not a uniform right answer: sometimes you care about the overall average, sometimes you care about the averages of subpopulations. You have to judge that based on the situation.

One of the other comments linked [1] which includes Judea Pearl's analysis of Simpson's paradox from a causal inference point of view [2], which lays this out nicely (though maybe not easy to understand--it took me many hours of study to get comfortable with Pearl's causal inference work, even with a strong stats background).

[1] https://plato.stanford.edu/entries/paradox-simpson/

[2] https://plato.stanford.edu/entries/paradox-simpson/#ConfPear...

gofreddygo
0 replies
3h9m

Ahh.. I've seen this pattern before, skim-read the wikipedia article and yet it didn't click.

Your comment made it all fit together. May not be the most accurate (I dont know) but it helped me. Thank you.

If you have any written material I can access, from your courses, I'd be interested to read.

Thanks for this.

marklubi
3 replies
1d14h

What is the deal with posting (seemingly) random links to Wikipedia articles lately without context?

Some of them are interesting, but most(?) of them come without details.

Please provide me with some context.

lukas099
2 replies
1d5h

Things hackers would find interesting. I think they have always been posted as long as I've been here.

marklubi
1 replies
1d3h

There is no context provided with any of these links. I'm not making any claims about whether or not they have interesting content.

There's been a huge uptick in submissions of random Wikipedia articles where there is no context provided with them.

There are maybe two or three in the last day that have anything beyond the title included. https://news.ycombinator.com/from?site=wikipedia.org

BeetleB
0 replies
1d3h

There is no context provided with any of these links.

Strange comment. There doesn't need to be.

bee_rider
3 replies
1d17h

For all of the examples on Wikipedia, it seems like there was some confounding extra variable that was missed. I wonder if anybody knows of a case where it just sort of happened randomly, with no big underlying cause?

Or maybe I’m thinking of it wrong and this is impossible.

eadler
0 replies
1d17h

It can happen any time there is a mix shift in the underlying quantity of the subgroups. It's just that random changes in quantities are not likely to be studied or reported. It's easy to generate manually though.

Kalium
0 replies
1d5h

The classic example of admissions does not have a missing confounding variable. It's a case of aggregation at the wrong level.

DonsDiscountGas
0 replies
1d6h

In order for it to count as Simpsons paradox I think there would need to be a confounding variable. It's certainly possible for it to appear spuriously, and for something to look like a confounder when it isn't, but there would need to be some type of subgroup.

freedomben
1 replies
1d2h

Hot damn, you're not kidding. I was struggling a bit with it from reading the article text, but that animation clarified it for me in seconds!

kfarr
2 replies
1d14h

Upon first glance I assumed it was the Simpson's episode about Mr. Burns having "a vast range of diseases so great in fact that they cancel each other out," explained here: https://simpsons.fandom.com/wiki/Three_Stooges_Syndrome

Surprised at the similarity but of course that was probably on purpose by the genius Simpsons writers of the late 90's.

pcwelder
0 replies
1d10h

Encountered it recently. I had two different dataset to evaluate model performance on from different domains.

One dataset was closer to training data and the other was closer to our business use case. The hypothesis was that performance on the latter dataset would be poorer due to overfitting.

Indeed the accuracy on all categories had reduced. However, overall accuracy was much higher!

This was because the second dataset had higher frequency of easy to predict categories.

If we had just looked at overall number we would have concluded that there was no overfitting to train domain, which was not the case.

paradocks
0 replies
1d14h

Lord's paradox is closely related to Simpson's, and I find it's a little easier to understand visually: https://repository-images.githubusercontent.com/597130499/46...

Imagine the horizontal axis is dose of a drug, and the vertical is the response, like hours of sleep. Looking at Lisa's response, it's clear that increasing the dose reduces sleep. Same for Bart. But if you do a linear regression of all the data, shown by the red line, dose increases sleep, which is wrong.

ninetyninenine
0 replies
1d13h

That visualization was so effective I didn't need to read the wiki at all. I get it just from that.

esafak
0 replies
1d16h

I did not know Simpson's paradox was an object lesson in causal inference until the other day. The right paradigm dispels the paradox. Here's a better article: https://plato.stanford.edu/entries/paradox-simpson/

caditinpiscinam
0 replies
1d14h

My gripe with most takes on the affirmative action debate is that they completely ignore this issue (in spite of the Berkeley case-study). It's trivial to take a set of admissions data and partition it by race to reveal a "racial bias", which would shrink or disappear if other factors correlated with race (like income) were accounted for.

LinAGKar
0 replies
1d8h

That second graph reminds me of Shepard tones [1], known e.g. from the Super Mario 64 staircase, where each component is steadily rising in pitch, yet the tone as a whole stays exactly the same in the long term.

[1] https://en.wikipedia.org/wiki/Shepard_tone

EGreg
0 replies
1d9h

Does this also explain the gender pay gap? Or is it only for trends?