return to table of content

The Dunning-Kruger effect is autocorrelation

tempestn
63 replies
21h22m

I don't buy this take, and this rebuttal does a better job than I could of explaining why: https://andersource.dev/2022/04/19/dk-autocorrelation.html

Basically, this autocorrelation take shows that if performance and evaluation of performance were random and independent, you would get a graph like the D-K one, and therefore it states that the effect is just autocorrelation. But in reality, it would be very surprising if performance and evaluation of performance were independent. We expect people to be able to accurately rate their own ability. And D-K did indeed show a correlation between the two, just not as strong of one as we would expect. Rather, they showed a consistent bias. That's the interesting result. They then posit reasons for this. One could certainly debate those reasons. But to say the whole effect is just a statistical artifact because random, independent variables would act in a similar way ignores the fact that these variables aren't expected to be independent.

crazygringo
15 replies
19h29m

Yup. Assuming the sample sizes are statistically significant, the original paper clearly shows:

- On average, people estimate their ability around the 65th percentile (actual results) rather than the 50th (simulated random results) -- a significant difference

- That people's self-estimation increases with their actual ability, but only by a surprisingly small degree (actual results show a slight upwards trend, simulated random results are flat) -- another significant difference

The author's entire discussion of "autocorrelation" is a red herring that has nothing to do with anything. Their randomly-generated results do not match what the original paper shows.

None of this really sheds much light on to what degree the results can be or have been robustly replicated, of course. But there's nothing inherently problematic whatsoever about the way it's visualized. (It would be nice to see bars for variance, though.)

somenameforme
7 replies
6h28m

"On average, people estimate their ability around the 65th percentile (actual results) rather than the 50th (simulated random results) -- a significant difference"

This is a different issue than D-K. The D-K hypothesis is that self assessment and actual performance are less correlated for weaker than higher performing individuals. People think they're better than average is a different (and much less controversial) bias.

---

[DK-Effect] : I totally know I scored at least a 30% on that test, and that's certainly way better than average (it's not). [Actually scored 10%]

[No DK-Effect] : I totally know I scored at least a 30% on that test, and that's certainly way better than average (it's not). [Actually scored 30%]

---

kstenerud
2 replies
4h9m

The D-K hypothesis is that self assessment and actual performance are less correlated for weaker than higher performing individuals.

Isn't that what the graph shows? The bottom quartile group is guessing almost 50 percentile points higher than their actual performance, whereas the top quartile is at most 15 points off.

They're all guessing somewhere between the 60th and 75th percentiles (i.e. "I'm a bit better than average") - with some upwards trend since the high performers seem to at least know they have some skill, although not very accurately. It's just that for the poor performers, a guess of the 60th percentile wayyy off the mark.

somenameforme
1 replies
1h45m

EDIT: Something important for the rest of this post. In case it's not clear, the graph is showing your percentile ranking within the group - not your actual score.

Nope, because there's an interesting statistical trick in play. Imagine you take 100 highly skilled physicists and give them some lengthy series of otherwise relatively basic physics questions. Everybody is going to rate their predicted performance as high. But some people will miss some questions simply due to silly mistakes or whatever. And those people would end up on the bottom 10% of this group, even if the difference between #1 and #100 was e.g. 0.5 points. Graph it as D-K did, and you'd show a huge Dunning Kruger effect, even when there is obviously nothing of the sort.

In fact the fewer differences in ability within a group, and the greater the relative ease of a task, the bigger the Dunning-Kruger effect you'd show. Because everybody will rate themselves relatively high, but you will always have a bottom 10%, even if they are practically identical to the top 10%.

You can see this most clearly in the original paper. They carried out 4 experiments. The one that was most objective and least subject to confounding variables was #2, where they asked people a series of LSAT based logic questions, and assessed their predicted vs actual results. And there was very little difference. Quoting the paper, "Participants did not, however, overestimate how many questions they answered correctly, M = 13.3 (perceived) vs. 12.9 (actual), t < 1. As in Study 1, perceptions of ability were positively related to actual ability, although in this case, not to a significant degree." Yet look at the graph for it, and again it shows some seemingly large D-K effect.

And there's even more issues with D-K, and especially experiment #1 (which is the one with the prettiest graph by far), but that's outside the scope of this post. I'm happy to get into it, if you are though. I find this all just kind of shocking and exceptionally interesting! I've referenced the D-K effect countless times in the past, never again after today!

[1] - https://sci-hub.se/10.1037/0022-3514.77.6.1121

dahart
0 replies
46m

Yes yes yes! I’m in the very same boat, and came to an epiphany that the ranking trick here, combined with some subjective questions (ability to appreciate humor - seriously!?), that these things hide almost everything about actual skill. Not only does it amplify mistakes, it also forces the participants to have to know something about their cohort. Having to guess your ranking fully explains the less than perfect correlation. It also undermines all claims about competence and incompetence. They’re not testing skill, they’re only testing ability to randomly guess the skill of others.

What about the slight bias upwards? Well, what exactly was the question they asked? It’s not given in the paper. They were polling only Cornell undergrads looking for extra credit. What if the question somehow accidentally or subtly implied they were asking about the ranking against the general population, and then they turned around and tested the answers against a small Cornell cohort? I just went and looked at the paper again and noticed that the descriptions of the ranking question changed between the various “studies” with the first one comparing to the “average Cornell student” (not their experiment cohort!). The others suggest they’re asking a question about ranking relative to the class in which they’re receiving extra credit. Curiously study 4 refers to the ranking method of study 2 specifically, and not 3. The class used in study 4 was a different subject than 2 & 3. How they asked this question could have an enormous influence on the result, and they didn’t say what they actually asked.

Cornell undergrads are a group of kids that got accepted to an elite school and were raised to believe they’re better than average. Whether or not all people believe they’re better than average, this group was primed for it, and also have at least one piece of actual evidence that they really are better than average. If these were majority freshmen undergrads, they might be especially in calibrated to the skills of their classmates.

In short, the sample population is definitely biased, and the potential for the study to amplify that bias is enormous. The paper uses suggestions and jumps to hyperbolic conclusions throughout. I’m really surprised that evidence and methodology this weak claims to show something about all of humanity and got so much attention.

dragonwriter
2 replies
1h20m

This is a different issue than D-K.

No, its literally the D-K finding.

The D-K hypothesis is that self assessment and actual performance are less correlated for weaker than higher performing individuals

That may have been a hypothesis Dunning and Kruger had at some point, its not the effect they actually identified from their research. But I don't think its even that, its an “effect” people have associated with D-K because they heard discussion of the D-K research that got dustorted at multiple steps from the original work, and then that misunderstanding, because it made a nice taunt, replicated widely and became popular.

somenameforme
0 replies
3m

This is straight from their paper [1]:

"Perhaps more controversial is the third point, the one that is the focus of this article. We argue that when people are incompetent in the strategies they adopt to achieve success and satisfaction, they suffer a dual burden: Not only do they reach erroneous conclusions and make unfortunate choices, but their incompetence robs them of the ability to realize it."

[1] - https://sci-hub.se/10.1037/0022-3514.77.6.1121

dahart
0 replies
35m

To be fair, the paper itself uses hyperbolic language that completely distorts it’s own data. It heavily pushes and leads the reader into one possible dramatic explanation for their results, while downplaying and ignoring a bunch of other less dramatic explanations. Using words like “incompetent” are almost completely unfounded based on what they actually did. Section headings like “competence begets calibration”, “it takes one to know one”, and “the burden of expertise” are uncurious platitudes and jumping to conclusions. I’m kind-of stunned at the popular longevity of this paper given how unscientific is it and how often replication results with better methodology have shown conflicting results.

dahart
0 replies
1h26m

The D-K hypothesis is that self assessment and actual performance are less correlated for weaker than higher performing individuals.

I’m not sure that’s an accurate summary. The correlation of the perceived ability is effectively the slope of the line, and the slope is more or less constant. The paper suggests that the bias of the bottom quartile is higher than the bias of the upper quartile, not that the correlation is any different.

But it’s strange that the DK paper makes an example of the lower performers, since the bias of the scores appears to be constant; it appears the high performers have pretty much the same bias as the low performers — it’s a straightish line that goes through 65% in the middle rather than the expected straight line that goes through 50% in the middle. If the ‘high performers’ had a different bias, then the line wouldn’t be so straight.

cortesoft
5 replies
12h51m

That people's self-estimation increases with their actual ability, but only by a surprisingly small degree (actual results show a slight upwards trend, simulated random results are flat) -- another significant difference

If everyone thinks they are slightly above average, isn't this inevitable? If everyone thinks they are slightly above average, people who are slightly above average are going to be the most accurate at predicting where they land?

cycomanic
1 replies
8h4m

Have you not just summarized the Dunning-Kruger effect in other words?

That essentially follows from everyone assume they are slightly above average. That's also the crux of the refutation and why the whole autocorrelation is a red hering, even if we all would just self assess completely randomly, that actually confirms the Dunning-Kruger effect is real (because if we self assess randomly worse performance are more likely to overestimate).

We could argue that this is not surprising, but the "surprising" bit is that the curves show that better performers are actually more skilled at assessing their performance, which incidentally was also confirmed by the followup studies.

quetzthecoatl
0 replies
7h32m

Is it though? Everyone overestimating their ability a bit isn't DK effect. It's when people with less knowledge and ability vastly over estimate their ability (because they don't know how little they know - while others do), and the opposite for those who are truly more able and knowledgeable (again because they understand how vast the topic is and though they know more and are capable more than the average person, they also understand how little they truly know compared to what they don't know)

zuminator
0 replies
11h35m

Even if "people tend to slightly overrate their own ability," was the only takeaway, it would still refute the author's conclusion that DK has nothing to do with human psychology.

dahart
0 replies
30m

If everyone thinks they are slightly above average, isn't this inevitable? If everyone thinks they are slightly above average, people who are slightly above average are going to be the most accurate at predicting where they land?

Yes, it’s inevitable. And this study only asked Cornell undergrads what they think of themselves - people who were taught to believe they are above average, and also people who got into a selective school and probably all had higher than average scores on standardized tests. Is it surprising in any way that this group estimated their ability at above average?

IanCal
0 replies
8h40m

Yes but then you'd see a flat line for people's estimates, which wasn't the result.

ketozhang
0 replies
15h26m

The autocorrelation is important to show that it's transformation to D-K plot will always give you the D-K affect for independent variables.

However, the focus on autocorrelation is not very illuminating. We can explain the behaviors found quite easily:

- If everyone's self-assessment score are (uniformally) random guesses, then the average self-assessment score for any quantile is 50%. Then of course those of lower quantile (less skilled) are overestimating.

- If self-assessment score vs actual score are dependent proportionally, then the average of each quantile is always at least it's quantile value. This is the D-K effect, which is weaker as the correlation grows.

-The opposite is true for disproportional relation.

So, the D-K plot is extremely sensitive to correlations and can easily over-exaggerate the weakest of correlations.

t_mann
9 replies
18h15m

I was surprised by the figure from the original article, imho that's the strongest rebuttal: perceived ability grows strictly mononotonically with actual ability, no sign of the famous non-monotonic U-curve. Yeah, the slope is less than one, and it grows a bit faster from the second to the third quartile than from the first to the second, but none of that changes the fact that people tend to slot themselves correctly. The chart is interesting in that it confirms that everyone perceives themselves to be slightly above average in terms of ability, which of course can't be true in practice. But what it also shows is that when they think they'll be below or above that (false) baseline, they're actually correct about it. So pretty much the exact opposite of what the Dunning-Kruger effect claims.

tempestn
4 replies
17h45m

The chart is interesting in that it confirms that everyone perceives themselves to be slightly above average in terms of ability, which of course can't be true in practice.

No, everyone biases their self-assessments toward a point slightly above the mean. That's not the same as saying everyone thinks they're slightly above average, nor that people's self-assessments have no predictive power whatsoever. The lowest performers still think they're below average, just not as much as they should. The highest performers still think they're considerably above average. But they all have a bias toward (slightly above) the middle.

So yes, people are generally correct in the direction that they deviate from that median self-assessment, but that just shows that people's self-assessments aren't completely without basis. Which D-K certainly didn't claim.

zeroonetwothree
1 replies
13h40m

But we don’t know their true ability, only the results on one test. It could be they accurately predicted their ability but because of random chance they did better/worse than their guess. Then you would get the exact data that is observed.

lokar
0 replies
11h40m

I thought they were estimating their performance on the test relative to others. There was no “real world” element.

t_mann
1 replies
17h39m

D-K claim a non-monotonic relationship, which simply isn't supported by that data, as you yourself point out: people rank themselves correctly (ordinally). I didn't mean to say that all self-assessments are the same, if that was the misunderstanding. My point is that the self-assessments indeed are meaningful, even more so than D-K claim.

RevEng
0 replies
14h38m

Check the original paper by D-K. Fix only focused on the first plot which has a monotonically increasing trend. The later plots show varying degrees of nonmonotonicity, though sadly they don't include error bars to indicate how statistically significant the differences between groups is.

jampekka
3 replies
18h0m

The slope will be less than one if there's e.g. any random guessing in the test even if the self-assesment is perfect (apart from whether they know if their guess is right or wrong of course) [1].

I think this is the effect that the post is dancing around, but doesn't seem to really understand (and how "autocorrelation" and indepence are discussed is very nonstandard to be charitable).

[1] https://en.m.wikipedia.org/wiki/Regression_dilution

t_mann
2 replies
17h45m

I agree, the statistical analysis in the original post makes me very uneasy. I think it could be a case where the conclusion is correct, even though argument isn't necessarily.

And yes, the fact that the slope is less than one is fairly uninteresting.

The real problem here is that the Dunning-Kruger effect, as it's classically stated, claims that if you asked four people to rank themselves in terms of ability, the result would be 1-3-2-4, ie the people who know a little would put themselves above the people who know a lot but aren't quite experts. The problem is the data shows that they'd actually rank themselves correctly 1-2-3-4. But such a boring finding probably wouldn't have made the authors quite as famous, which might be why they tried bit of data mangling, and they found this really cool story that everyone would secretly love to be true.

Which is a shame, because I think the fact that the mean of perceived ability is too high (and the variance too low) is really interesting too, and perfectly supported by the raw data.

tempestn
0 replies
15h14m

But they wouldn't. They'd rank themselves something like 1,2,2,3. We're not dealing with a population collaborating to all rank themselves in order, but rather each person individually estimating where their abilities lie in the population.

The point is that if you ask someone in the, say, 5th percentile of ability what their ability is compared to the population, they might say 25th percentile. Ask someone at the 25th,and they might say 40th. At the 40th they could say 55th. And at the 90th, maybe they'll say 80th. So yes, if you order their guesses, they will be in roughly the correct order. But, crucially, that doesn't mean that they are ranking themselves correctly!

jampekka
0 replies
17h20m

Yes. The methodology in the original D&K is quite shoddy, and vulnerable to e.g. good old regression to the mean, and the interpretations are too strong. This is sadly very common in psychology (and many other fields I'd guess) and even researchers don't care so much if the story is juicy enough.

The pop version of the DK effect seems to be something like a 4-3-2-1 ranking, which is obviously not supported by the data.

svnt
7 replies
21h3m

The author of this assumes the conclusion in order to decide how to analyze his data.

He cannot reasonably say both:

we have a decision to make: what are we going to assume? How are we going to quantify our surprise from the results?

The first option is, as in the case of the state census, to assume dependence between X and Y. I.e. to assume that, generally, people are capable of self-assessing their performance.

The second option conforms with the Research Methods 101 rule-of-thumb “always assume independence.” Until proven otherwise, we should assume people have no ability to self-assess their performance.

It seems to me glaringly obvious that the first option is much, much more reasonable than the second.

— and -

most notably the claim that the more skilled people are, the better they are at self-assessing their performance. This result is supported by their plot, but in any case, my issue is not with objections to this claim

and then expect to carry any credibility.

The author of this piece both suggests that a key variable is fixed and later admits it varies within the same dataset.

I guess at least they admit it, but this lacks basic self-consistency.

Jensson
4 replies
20h13m

The author of this piece both suggests that a key variable is fixed and later admits it varies within the same dataset.

I don't see how that variable changes, here is an example how the error variable can be exactly the same for everyone and reproduce the results:

Lets say the overconfidence is always that you feel 50% of those better than you are actually worse than you. So everyone is equally overconfident, just that the top wont move their own placings as much as the bottom since there are much fewer people that they can mistake being worse than them. Then apply noise to this and you get the graph Dunning-Kruger got.

You could say "But they are better at estimating their rank!", but that is just a mathematical artefact, it isn't a psychological result. Even if everyone always guessed that they are number 1, the better you are the better your guess will be, but in that case it is easy to see that everyone overestimates their skill in the same way instead of the better people having a fundamentally different way of evaluating themselves.

svnt
1 replies
16h44m

Both analyses seem to agree on one finding: people’s skill at estimating their own ability increases with that skill. It can’t be a purely mathematical artifact because you would see a tapering at either end, or a narrowing distribution of errors at the bottom end, not just a narrowing toward the top end.

This should be unsurprising for anyone who has become sufficiently skilled at something. Beginners can’t even discern the differences the experts are discussing, and frequently make errors in classes they don’t even understand.

chiefalchemist
0 replies
14h59m

Beginners, by definition, are guessing 100%. Some will guess high, others low, and the rest in between. But they are all guessing. Perhaps There's a cultural bias to over-estimate their skill? Perhaps there's a nudge in the process of the study that led them to overestimate?

The lede isn't that people over-estimate their skill level. The lede is, why would that be as they have nothing else to go on. That is the trigger or triggers? And to say, the more experienced estimate better? Well, duh.

raincole
1 replies
17h55m

Lets say the overconfidence is always that you feel 50% of those better than you are actually worse than you. So everyone is equally overconfident, just that the top wont move their own placings as much as the bottom since there are much fewer people that they can mistake being worse than them. Then apply noise to this and you get the graph Dunning-Kruger got.

But the data of original D-K paper shows that the top 25% people underestimate their placings. So this whole paragraph, while logically true, has little to do with the original D-K effect.

You could say "But they are better at estimating their rank!", but that is just a mathematical artefact, it isn't a psychological result. Even if everyone always guessed that they are number 1...

If everyone always guessed that they are number 1, it's a huge psychological result: it means people are extremely irrational when it comes to self-evaluation.

Jensson
0 replies
16h57m

But the data of original D-K paper shows that the top 25% people underestimate their placings. So this whole paragraph, while logically true, has little to do with the original D-K effect.

That is what you would expect under my model, due to the randomness being limited upwards for the high placings but still go downwards. That is the effect the article we are talking about refers to when they say "Autocorrelation".

contravariant
1 replies
15h31m

I'm utterly confused. The latter statements it just the author explaining which parts they didn't discuss in their article; it has no bearing whatsoever on the section before it.

svnt
0 replies
1h55m

It discloses the cognitive dissonance in his position. He seems to be saying both “skill at assessing ability is random and mathematically bounded only” while admitting “skill at assessing ability changes with ability.”

xpe
4 replies
13h29m

The rebuttal by Daniel (andersource.dev) is useful, generally. However, when he writes ...

The history of statistics is well out of scope for this post, but very succinctly, my answer is that statistics is an attempt to objectively quantify surprise.

... I cannot agree. Statistics is not this; it is much broader. One may or may not be surprised by particular statistics, sure, but there are _specific_ concepts that map more directly to surprise, such as entropy from information theory.

vasco
3 replies
8h32m

If entropy is defined as statistical disorder than I think the definition of "quantifying surprise" is great.

xpe
2 replies
4h52m

You aren't suggesting that statistics as a field defined a notion of "order", prior to thermodynamic entropy or Shannon entropy, are you? To me, that would be circular.

Based on my knowledge, it seems likely the first published quantification of disorder arose in the study of thermodynamic entropy. Later, Shannon defined entropy in information-theoretic terms, independent of physics. It can be interpreted as a notion of 'surprise' or what he called information.

My claims:

First, the field of statistics is _not_ historically rooted around concepts such as: "order/ordering" or "information/surprise".

Second, the field of statistics, as a directed graph of abstractions, is not rooted in ordering nor surprise.

Third, in teaching statistics, practically or conceptually, the concept of surprise isn't foundational. The idea of _variation_, on the other hand, is central.

I'll add a few more comments. To talk meaningfully about 'surprise', there has to be a stated or assumed baseline or 'expectation' about what is _not_ surprising. For Shannon, if the probability of an event is certain, there is no surprise. Probability and statistics work together, but they are conceptually separable. This is particularly clear when you compare descriptive statistics with, say, probabilities over combinatorics problems.

vasco
1 replies
4h35m

The field of statistics is not organized around concepts relating to "order" or "ordering".

Sure but reduced to the simplest form, statistics are used to predict things, the most basic thing in the Universe being "is this particle gonna stay put or move a little in a given direction", which is related to entropy, so to me intuitively these two things seem very related. The fact that in statistics we don't use the words "order" and "disorder" doesn't mean it doesn't reduce to that.

Btw I'm an electrical engineer that isn't amazing at statistics or thermodynamics so beware I might just be talking nonsense.

xpe
0 replies
2h42m

... reduced to the simplest form, statistics are used to predict things

Inferential statistics is not the simplest kind of statistics. Descriptive statistics are both simpler and foundational for inference.

P.S. I should say that I am a bit of a stickler regarding discussions along the lines of e.g. "these things are related". Yes, many things are related, but it is really nice when we can clearly tease things apart and specify what depends on what.

Jensson
4 replies
21h12m

The effect that the worst overestimate their skill is known since before, that wasn't the main result of Dunning-Kruger. The effect that the best underestimate their skill can be chalked up to auto-correlation.

tempestn
3 replies
20h59m

The best don't tend to overestimate their skill; they underestimate it. The D-K results show a consistent bias in estimates toward (somewhere near) the mean. Hence an overestimate at the bottom and an underestimate at the top.

anonymouskimmer
1 replies
20h35m

Dunning-Kruger posits this as a psychological effect, yes? On the top half psychological effects such as imposter syndrome could come in to play.

Have sociological factors such as being kind or big fish little pond been considered as likely causes of the misestimates?

chiefalchemist
0 replies
14h47m

I have the same question...why do some get it so wrong? Was there a nudge in the process of the study that caused some to answer what they did?

Heck, I'm wondering if "Honestly, I can't say" was an allowed response. Or were they forced to pick a number? If so, then I'd want to know what happens when you ask 100 ppl to pick a number between 0 and 100. I bet it's not evenly distributed. Maybe the beginners give a "discounted" version of the distribution?

Even if the autocorrection explanation is off, there does now seem to be flaws in DK, at least from the perspective of pure and proper science

Jensson
0 replies
20h58m

The best don't tend to overestimate their skill; they underestimate

I wrote the wrong word, I fixed it. The best can't overestimate their rank, so of course that wasn't what I meant.

atleastoptimal
3 replies
20h22m

The issue is people have differing personal definitions of Dunning Kruger. The generally demonstrated effect in the sample of people Dunning and Kruger analyzed was "people tend to estimate the percentile of their own skill as closer to the average than it really is, with a slight bias towards an above-average mean. This leads to overestimation of relative ability by those in lower percentiles, and the opposite for those in higher percentiles"

However when people cite Dunning Kruger in popular culture they mean "below average people think they're above average, and above average people assume they're below average", which was not shown in the original study, and wouldn't show up in an analysis attempting to justify it via a misunderstanding of autocorrelation.

The general point in the rebuttal is correct. A completely noisy graph of people's estimations of their own ability would show a Dunning-Kruger resembling residual graph (x-y vs x). However, one wouldn't expect people in the 1st percentile to have an equal distribution of perceived skill as people in the 50th or 99th percentile. If that were true, it would be worth reporting.

ShamelessC
2 replies
19h28m

"below average people think they're above average, and above average people assume they're below average"

There’s no way to know if you’re wrong, but when I see it used it seems to be pointing out - “some (not all) under qualified people tend to defer to their own beliefs rather than the views/statements from experts, even when that is demonstrably silly.”

^ Referring to the pop-sci interpretation, not in disagreement with the general point.

staunton
1 replies
18h59m

Which also has nothing at all to do with this study by Dunning and Kruger. So you agree with the general point of parent.

ShamelessC
0 replies
18h52m

Yes. Just clarifying a small disagreement about the pop-sci interpretation of the phrase.

somenameforme
1 replies
6h37m

I found two very interesting things in the original D-K paper [1] that challenge your otherwise reasonable point. The first is that the graph everybody associates with D-K, the one showing the beautifully perfect linear result, is one of 4. The other 3 graphs are far messier, and indeed the paper discusses the fact that the correlations tend to be weaker and in some cases nonexistent.

The second thing is that that beautiful perfectly linear graph everybody references, was measuring 'humor'!!! Humor is going to be something that's all but guaranteed to create near complete noise between self evaluation and 'expert' (professional comedians in this case) evaluation. And if everybody is essentially randomly guessing on their performance, then it will always show an extremely strong D-K effect with the top performers underestimating themselves, and the bottom performers overestimating themselves.

The experiment that most simply and directly measured 'intelligence', without complicating matters in a potentially confounding fashion, is #2. It was based on logic problems from the LSAT. And the resultant graph is just all over the place. Quoting the paper's evaluation of this study:

---

"Participants did not, however, overestimate how many questions they answered correctly, M = 13.3 (perceived) vs. 12.9 (actual), t < 1. As in Study 1, perceptions of ability were positively related to actual ability, although in this case, not to a significant degree."

---

This is really looking like another Zimbardo.

[1] - https://sci-hub.se/10.1037/0022-3514.77.6.1121

mike_hearn
0 replies
2h22m

Yes, D-K is another one of those "classic" psychology studies that everyone knows about but is actually rubbish and shouldn't be cited for anything. You're not the first to notice this, I pointed it out on HN last year:

https://news.ycombinator.com/item?id=31119836

At some point I should write up a proper blog post on the D-K paper in the hope that it eventually surfaces in search results, because it's past time for this paper to be put to bed. The problems you cite aren't even the full set. The whole thing was (of course) a study on a handful of psych undergrads, their selection method for expert comedians has circular logic in it and it all goes downhill from there.

raincole
1 replies
18h8m

And D-K did indeed show a correlation between the two, just not as strong of one as we would expect. Rather, they showed a consistent bias. That's the interesting result.

"D-K effect in its original form" vs "D-K effect in pop culture" is the biggest D-K effect live example. Of course I mean D-K effect in pop culture here.

Interestingly, the "interesting" part of the original result is that the correlation between actual performance and perceived performance is less than people intuitively think.

But as the "D-K effect in pop culture" spreads, people's collective intuition changes. Today if you explained the original D-K effect to a random person on the internet, they might find it interesting because the correlation is greater than they thought: they thought the correlation would be negative!

hoosieree
0 replies
18h5m

D-K effect effect is almost as entertaining as the Butterfly effect effect[1].

[1]: Which is the far-away effect attributed to having watched the movie The Butterfly Effect.

cool_dude85
1 replies
17h27m

Yeah this must be some high end satire where the guy Dunning-Krugers up an explanation of Dunning-Kruger. Since even an economist is supposed to understand ANOVA I have to conclude that this article is a joke.

nickelpro
0 replies
16h53m

The incorrect usage of "autocorrelation" made me double take and wonder if this was satire the first time it was posted.

bradley13
1 replies
10h2m

I have to agree. You cannot separate the statistical analysis from the meaning of the study. In the article, the author's random data is exactly an extreme replication of Dunning-Kruger. Why? Because, in his random data, people with low test scores almost always overestimate their ability, while people with high test scores almost always underestimate.

That is precisely the premise of the Dunning-Kruger effect. The fact that the original Dunning-Kruger paper shows a less extreme effect? That just shows that people are slightly better than random at estimating their own abilities - but still nowhere accurate.

jgilias
0 replies
8h50m

So that’s what the Dunning-Kruger effect basically boils down to, right? That people in general are just bad at assessing their skills.

IAmGraydon
1 replies
18h57m

So what we have here is some scientists trying to prove that the Dunning-Kruger effect doesn’t exist and instead they give us a perfect example of the Dunning-Kruger effect.

wyldfire
0 replies
18h49m

The irony is that the situation is actually reversed. In their seminal paper, Dunning and Kruger are the ones broadcasting their (statistical) incompetence by conflating autocorrelation for a psychological effect. In this light, the paper’s title may still be appropriate. It’s just that it was the authors (not the test subjects) who were ‘unskilled and unaware of it’.
zeroonetwothree
0 replies
13h27m

This rebuttal seems weak because it’s using unbounded datasets (population). A big issue with the DK research is using bounded data (test scores). For example if I get 100% right it’s mathematically impossible to have overestimated.

mnky9800n
0 replies
6h58m

I really appreciate that he points out that the use of the term in the original article of autocorrelation is nonstandard. Because it is nonstandard but it's a rather flippant way to dismiss the rest of the article.

kkoyung
0 replies
12h47m

I agree. Using the terminologies from the author, the DK paper was trying to show that dy/dx < 1 = dx/dx, rather than the correlation of y-x vs x.

gwd
0 replies
2h37m

And D-K did indeed show a correlation between the two, just not as strong of one as we would expect. Rather, they showed a consistent bias. That's the interesting result.

Right, so:

1. If the data were truly random, with no correlation, we'd expect the line to be straight across the middle, with the first quartile at 50% and the last quartile also at 50%

2. If the data were 100% accurate and precise [1], we'd expect the line to be diagonal, with the first quartile at 12.5% and the last quartile at 87.5%.

3. If the data were accurate but not precise (i.e., basically right but with some randomness built in), we'd expect the line to be in between #1 and #2 -- basically, changing from #2 into #1 as the randomness increases, but with the intersection at 50%.

That's because someone in the 2nd percentile can't underestimate themselves as much as they can overestimate themselves, and someone in the 98th percentile can't oversetimate themselves as much as they can underestimate themselves. But in any case, the "0 bias" case looks symmetric.

4. But what we actually see is none of the above: we see the 1st quartile being at (eyeballing the chart) 60%, and the last quartile at 75%.

That shows that there is indeed some ability for self-evaluation, but it's off. The fourth quartile could indeed just be random, the effect of clipping at the top meaning that the upper quartile cannot overestimate themselves as much as they understimate themselves. But there's no getting around the fact that the bottom quartile are overestimating themselves.

[1] https://en.wikipedia.org/wiki/Accuracy_and_precision

expazl
0 replies
17h36m

But in reality, it would be very surprising if performance and evaluation of performance were independent. We expect people to be able to accurately rate their own ability.

This seems to be attacking an irrelevant point in the analysis. The argument goes as such: Researcher carries out all the studies needed to prove the Dunning-Kruger effect, then trips and drops all the results into a vat of acid. But he's ashamed and quickly generates random numbers for the results, and somehow the data still proves the Dunning-Kruger effect. Not just that, repeating the same exercise again and again with completely random data leads to the same result, the effect is always present. So is the Dunning-kruger effect so powerful that it exists in the very fabric of the universe devoid of any human interaction, or is something amiss?

In this situation we are forced to look at the test we have that concluded from the data that the Dunning-Kruger effect exists and conclude that it's a bad test, we need something different.

You seem to be arguing "oh no, you can't look at random data, because we wouldn't expect the experiment to yield random data!". But that doesn't work as an argument for why the test should still be considered good. If it's supposed to have any worth, then the test has to be able to come to one of two conclusions: The Dunning-Kruger effect exists or the Dunning-Kruger effect doesn't exist. And if the test is set up such that for positive experimental results, or just random noise, it comes out in the positive, and only in extremely unlikely and a narrow band of the possible outcome space come out negative, then the test is bad.

If we want to try to rephrase everything a bit to make the issue much clearer. Lets set up a coin-toss competition between ChatGPT and a group of 100 people. Each participant goes 1:1 against ChatGPT where both parties toss a coin and whoever has the most heads wins, on draws toss again, in case a pair goes into an infinite loop that doesn't end before our allotted trial time, they get removed from the study. A human assistant tosses on the behalf of ChatGPT on account of it not having arms yet.

Now we ask each person how they would rate their ability vs. ChatGPT in a coin-toss, everyone answers 50/50, for obvious reasons.

So we run the experiment, the line for "ability plotted against ability" is a straight diagonal line. The line for estimated ability vs actual ability is a a straight flat line at 50%.

Eureka! To the presses! we have just proven the Dunning-Coin-Kruger effect! People who are worse at throwing coins tend to over estimate their ability, and people who are better at throwing coins underestimate their ability! What a marvelous bit of psychological insight, it really tells us something about how the human mind works, and has broader insights about our society! But naturally we always expected this outcome, people who are bad a tossing coins are dumb and of cause they are overconfident, not like people who are good at tossing coins who have a remarkable Intellect about themselves and are therefore humble in their self estimation... and so on and on about preconceived biases that have nothing to do with the actual test we performed.

bitshiftfaced
15 replies
20h44m

The authors did "X - Y vs X," but that's not even the biggest problem. The authors subtracted two measures that had been transformed and bounded from 0 to 1 (think percentiles). What happens at the extremes of those bounds? How much can your top performers overestimate their performance? They're almost at 1 already, so not much. If they were to overestimate and underestimate at the same rate and by the same magnitude in terms of raw values, the ceiling effect on the transformed values means that the graph will make it look like they underestimate more often. The opposite problem happens for the worst performers.

See "Random Number Simulations Reveal How Random Noise Affects the Measurements and Graphical Portrayals of Self-Assessed Competency." Numeracy 9, Iss. 1 (2016), particularly figures 7, 8, and 9.

SamBam
4 replies
19h1m

Exactly, that was my thought. How would it be possible to get anything other than the D-K effect, even if it wasn't just averaging to the mean?

The lowest quartile can't say they're below the lowest quartile, so any error at all will be counted as "overconfidence." The top quartile can't say they're above the top quartile, so any error at all will be counted as "underconfidance."

anonymouskimmer
2 replies
16h3m

Exactly, that was my thought. How would it be possible to get anything other than the D-K effect, even if it wasn't just averaging to the mean?

Quite easily with the method they demonstrate in the study in figure 11. In that study test participants are not rating themselves in terms of population percentages, but in terms of the percentage correct they got on the test. In such a case the test could be designed to have a huge ceiling that even the most knowledgeable participants would have trouble reaching. And could have such a low floor that even the least knowledgeable participants would still get some answers correct (unless they weren't even trying, which would allow throwing out their data points).

With 20 questions you could have four gimmes and four impossible questions, bounding the worst participants to about 20% and the best to about 80%.

SamBam
1 replies
14h24m

Right. To clarify, I meant: with the original study design, how could they not have gotten the result they did? (And that's rhetorical.)

anonymouskimmer
0 replies
14h13m

It would have been noteworthy in the original design if more than one group of participants were, on average, within their quartiles on the guessing. I also find it noteworthy that the average guess of the lowest quartile is lower than the average guess of the second lowest quartile, and on up the quartiles. On one hand this shows some awareness of relative ability along a massively smooshed logarithmic scale. On the other hand I wonder if this laddering follows as the averages are split into quintiles and deciles.

jmpeax
0 replies
13h20m

I wonder if estimating on the logit scale would solve this problem.

ImaCake
3 replies
20h4m

Thanks for stating just how much of a statistical minefield this is. The reference does a great job showing just how wrong the DK studies are. Unfortunately, most people have already made up their minds and are happy to link conflicting blog posts as evidence.

concordDance
1 replies
8h28m

wrong the DK studies are

The DK studies are not wrong, they are misinterpreted by people who don't know what they're talking about (e.g. what tge DK effect actually is), like this blogger.

"People have worse self assessment ability as their real ability declines" would be a valid interpretation of the DK data and notably would NOT be a valid conclusion from the random data in the blog post.

ImaCake
0 replies
4h11m

You should read the reference we are discussing which makes no such mistakes.

Probiotic6081
0 replies
19h16m

Probably in another year or two they'll find another statistic that will render the old one moot like again and again.

dclowd9901
1 replies
19h38m

I think if people at all levels of skill were reasonably good at measuring their own ability, we would see two curves that roughly overlap. Instead we see the graph given.

The fact that random noise can generate a mean curve on the Y axis doesn’t mean DK doesn’t exist. It just means DK’s mean self analysis resembles a middling random mean, which if you think about it, makes sense. Most people will probably self evaluate as average, regardless of their actual skill. This means DK is right as rain.

expazl
0 replies
16h47m

I think if people at all levels of skill were reasonably good at measuring their own ability, we would see two curves that roughly overlap. Instead we see the graph given.

Actually, due to the construction of the test, the ability to evaluate your own absolute ability in a subject isn't sufficient for the two lines to be able to overlap.

It's a percentile axis, so you need to be able to reasonably accurately estimate the ability of everyone taking the test, and where you fall in the quartile range of those participants.

anonymouskimmer
1 replies
20h4m

This can be dealt with to an extent by truncating the extreme ends. Even the middle quartiles in the graphs in the linked article show the same trends.

bitshiftfaced
0 replies
1h24m

Not that simple. This article demonstrates why enforcing bounds results in the changes in slope that you see in the expected grades (figure 2 and 4): https://www.frontiersin.org/articles/10.3389/fpsyg.2022.8401...

wjnc
0 replies
6h15m

Lognormality of data is killing for the methods of social scientists. If I were to hypothesize the underlying mechanism then it would be that raw skill is lognormally distributed for those taking tests at all (at least participating in these test usually entails an implicit lower bound on IQ, but also from the long tail of high performance in say sports), tests try to measure performance but with a reduction to normality (or 4 categories) and then people estimate their own skills based on their task and grading experiences which are also reduction to a normal or constant distribution. (“I was always a B- in math in high school and expect that to have distribution X and this test to follow that distribution“).

It’s three places where reductions in dimensionality take place both implicitly and explicitly. I don’t envy researchers trying to unpeel this onion. I do like the unraveling of all these problems that pop up in pretty accessible designed experiments. It makes for better understanding.

dimask
0 replies
12h43m

The boundedness of the data is also the main argument here https://www.frontiersin.org/articles/10.3389/fpsyg.2022.8401...

anonymouskimmer
9 replies
20h18m

If the Dunning-Kruger effect were present, it would show up in Figure 11 as a downward trend in the data (similar to the trend in Figure 7). Such a trend would indicate that unskilled people overestimate their ability, and that this overestimate decreases with skill. Looking at Figure 11, there is no hint of a trend.

There certainly is a hint of a trend. Why do people, when visualizing data with a distinct trend, say that because the "error bars" from a particular statistical test overlap zero that no trend exists!?

Freshman trend to over-confidence. Grad students trend to under-confidence. Undergrads in general trend to over-confidence (though this trend decreases as year in school increases), and post-graduates, whether grad students or professors, trend to under-confidence.

These "trends" are not statistically significant, but they certainly are a trend!

Also, the random data distribution in figure 9 doesn't show the same trends as Dunning-Kruger's curve in figure 2. Perhaps there is at least one psycho-social mechanism here worth investigating?

mrkeen
6 replies
20h4m

These "trends" are not statistically significant, but they certainly are a trend!

This is an oxymoron.

anonymouskimmer
3 replies
19h43m

Show how.

I place mechanistic theory prior to statistics in science. Mechanistic theory can be tested, statistics are a kind of test.

If a statistically-insignificant result shows consistent, though non-significant deviations, such as the kind seen in Figure 11, then it tells me it's worth investigating whether mechanism(s) are explaining a very small portion of the variation that will not, in itself, show up as statistically significant, as it's being swamped by variation in other parameters.

Dylan16807
2 replies
19h32m

Consistency is a synonym for statistical significance. If there's consistency beyond random alignment, then there should be a statistical test you can apply over your data to extract the signal.

You can extract surprisingly small signals relative to variation in other parameters. But if it's actually swamped, then it might not be real, so go get more data.

anonymouskimmer
1 replies
18h37m

Consistency is a synonym for statistical significance.

So basically you're telling me that if I can visually see a consistency that does not show up in their statistical test, then they aren't running an appropriate statistical test on what I'm seeing.

But if it's actually swamped, then it might not be real, so go get more data.

Even better to design other experiments.

Dylan16807
0 replies
18h21m

So basically you're telling me that if I can visually see a consistency that does not show up in their statistical test, then they aren't running an appropriate statistical test on what I'm seeing.

Either they're not doing the right statistics, or it's a "consistency" that is much more likely to show up randomly than you naively expect, and the study needs to be repeated or enhanced.

Sometimes you can see a pattern that's just a figment of chance. See also: numerology, jelly bean xkcd

Dylan16807
1 replies
19h46m

Oxymorons only sound contradictory on a surface level.

Something "certainly" being a "trend" is the definition of statistical significance, so this is a straight up contradiction.

anonymouskimmer
0 replies
19h41m

See here: https://news.ycombinator.com/item?id=38416858

"Trend" has multiple meanings. Statistics doesn't get to claim all of the meaning.

Dylan16807
1 replies
19h40m

If they're actually error bars, you can shrink them with more data. That will turn the hint of a trend into an observation of a trend. If it wasn't random noise giving a fake hint.

anonymouskimmer
0 replies
19h36m

If they're actually error bars, you can shrink them with more data.

Assuming the new data has the same systemic or instrumental bias as the old data. Even using a different test date could skew results enough to widen the error bars.

snarkconjecture
5 replies
20h19m

Nonstandard terminology warning: the author is using "autocorrelation" in a way I've never seen before. There is a much more common usage of "autocorrelation" to refer to the correlation of a timeseries with itself (shifted by some amount).

If you use autocorrelation to refer to the thing in OP, you'll probably confuse people who know statistics, and vice versa.

epigramx
1 replies
14h5m

you might say the article author might have some ..dunning-kruger on what autocorrelation is.

nothrowaways
0 replies
9h47m

L2 of dk

xpe
0 replies
13h47m

Nonstandard terminology warning: the author is using "autocorrelation" in a way I've never seen before.

That's a nice way of putting it. A more accurate description would be: the author is butchering the key essence of autocorrelation, since they don't clearly mention that it is a temporal relationship!

What is autocorrelation?

Autocorrelation occurs when you correlate a variable with itself.

Groan.

A standard definition is:

Autocorrelation refers to the degree of correlation of the same variables between two successive time intervals. It measures how the lagged version of the value of a variable is related to the original version of it in a time series. Autocorrelation, as a statistical concept, is also known as serial correlation.
ketozhang
0 replies
16h12m

The more common experience with autocorrelations are with time series, but what the author said is correct even in that context. A time series autocorrelation relates the same time series function at different times. At the simplest you plot the arrays X vs X where X[i] = f(t[i]). You then may complicate it further by some transformation g(X) vs X (e.g., moving average).

gnicholas
0 replies
11h52m

What term is appropriate to describe what the author is referring to?

r0uv3n
4 replies
21h0m

The discussion between Nicolas Boneel and the author in the comments of the article is interesting and Nicolas expresses the doubts I had when reading this. The whole point of the DK effect is that people are bad at estimating their skill, so if you assume that they randomly guess their skill level then of course you will replicate the results.

The correct model for a world without DK should be something like (estimated test scores)=(actual test scores)+noise, and then the only form of spurious DK you'd expect is caused by the fact that there's a minimum and maximum test score. But this effect would be proportional to the variance of the noise, and I assume the variance on the additional dataset is too low to fully understand the effect seen there.

Also, in this model on average everyone should still guess correctly in which half of the distribution they are, but even the bottom quartile seemed to estimate their abilities as above the 50th percentile

Jensson
1 replies
20h47m

Also, in this model on average everyone should still guess correctly in which half of the distribution they are, but even the bottom quartile seemed to estimate their abilities as above the 50th percentile

Depends on the noise applied. If the noise is -10% to +100% for everyone then you get roughly the graph Dunning-Kruger got. So there is no reason to believe that the best are better at estimating their abilities, just that you can't estimate your own rank as better than the best.

tempestn
0 replies
20h4m

That's a great observation. For what it's worth though, it does seem logical to me that the best would also be best at estimating their skill. Not necessarily because they're better at it per se (though there's likely some of that too, for the reasons originally posited by D-K), but also because they have an easier problem to solve. When you know something well, it's fairly obvious that that's the case. (Think of the experience of acing a math test. It's entirely possible you'd know you answered everything correctly.) When you struggle somewhat though, it's much more difficult to estimate how much you're struggling compared to how others would fare.

svnt
0 replies
20h50m

Just because the data appear random doesn’t mean you’ve gotten at the cause though.

From those charts it could equally be low skill throughout, or something nuanced like lack of skill at estimating at the bottom, improving skill in estimating through the middle, and high skill and learned modesty at the top.

jampekka
0 replies
17h51m

The correct model is probably (estimated test score + estimation noise) = (actual test score + test noise). The test contains a random element, e.g. guessing, that the person can't estimate.

https://en.m.wikipedia.org/wiki/Regression_dilution

https://en.m.wikipedia.org/wiki/Errors-in-variables_models

lencastre
4 replies
21h20m

Wasn’t this DK effect already debunked?

xbar
0 replies
20h57m

Yes. This article highlights the 2016, 2017 and 2020 debunkings of DK. But it hangs on as an oft repeated scientific fallacy.

The fact that anyone has to ask if it has debunked shows how desirable some people find the DK myth. Even in the comments here, people are not willing to be skeptical of DK. That's interesting psychology.

mrkeen
0 replies
20h11m

Yes but some claim to have debunked the debunking also. [1]

This paper (2023) claims "the magnitude of the effect was minimal; bringing its meaningfulness into question." [2]

[1] https://andersource.dev/2022/04/19/dk-autocorrelation.html

[2] https://www.sciencedirect.com/science/article/abs/pii/S01602...

jahewson
0 replies
21h3m

I don’t know much about it but I’m sure you’re right.

hasch
0 replies
20h58m

Article mentions 2016 somewhere. They explain a bit on top of that, with more depth ... at least my rough take on this

greenthrow
3 replies
21h4m

Lmao this article is an example of Dunning-Kruger at work. The author thinks they have found and are revealing something but they are just failing to fully understand the subject. Amazing.

flappyeagle
2 replies
20h52m

Try reading the article again and understanding the argument.

greenthrow
1 replies
19h55m

Oh I did. Completely.

mattxxx
0 replies
17h48m

Wait... but what if this is DK? What if my comment is DK??

dmbche
3 replies
21h21m

Isn't it ironic that they fooled themselves?

ulizzle
2 replies
20h21m

It was actually hilarious but I don’t think many people here got the irony

DangitBobby
1 replies
15h14m

Literally the closing paragraph of TFA is about that exact irony.

robwwilliams
0 replies
1h54m

And here it is from OP (which made me laugh—right or wrong). And leave your hubris at home unless you rate yourself a damn fine statistician ;-)

“However, there is a delightful irony to the circumstances of their blunder. Here are two Ivy League professors7 arguing that unskilled people have a ‘dual burden’: not only are unskilled people ‘incompetent’ … they are unaware of their own incompetence.

“The irony is that the situation is actually reversed. In their seminal paper, Dunning and Kruger are the ones broadcasting their (statistical) incompetence by conflating autocorrelation for a psychological effect. In this light, the paper’s title may still be appropriate. It’s just that it was the authors (not the test subjects) who were ‘unskilled and unaware of it’.

resource0x
2 replies
17h35m

Can someone explain the difference between Dunning-Kruger effect and "illusory superiority" effect (https://en.wikipedia.org/wiki/Illusory_superiority)?

zeroonetwothree
1 replies
13h29m

DK says that skilled people tend to underestimate their skill while unskilled people tend to overestimate their skill. This is likely a statistical artifact.

IS says that people tend to overestimate their own skill compared to how other people estimate their skill. This seems likely true on average but not necessarily in all cases.

resource0x
0 replies
21m

Then IS, in a sense, implies DK. At least with respect to unskilled people. Suppose for the sake of an argument that 90% of drivers believe they are above average. Let's assume 1/4 of them are less skilled drivers. Clearly, in this group, the overestimate of their skills is much bigger than that of the next 1/4 of more skilled ones. As a corollary, we get DK.

randomizedalgs
2 replies
19h36m

Consider the imaginary world that the author describes, in which people's estimate of their score is independent of their actual score. Wouldn't it be fair to say that, in this imaginary world, the DK effect is real?

The point of the effect is that people who score low tend to overestimate their score and people who score high tend to underestimate. Of course there are lots of rational reasons why this could occur (including the toy example the author gave, where nobody has any good sense of what their score will be), but the phenomenon appears to me to be correct.

skrebbel
0 replies
1h37m

Woa of course, this is the point.

The author's example with random points is bad because you might reasonably expect people to behave differently than uniform random points.

It'd be reasonable to expect that people who are good at a thing estimate that they are good at it, and that people who are bad at a thing, estimate that they're bad at it. I mean, my kids love math and always estimate themselves to do well on math tests (and they usually do). They have classmates loudly detest math, estimate they'll do badly, and often do (at least somewhat). Similarly I'm a bad cook and I have no doubt that if I join a cooking contest, I'll get few jury points. The expected data is correlated.

So if a study finds that, well actually, the data is not at all that correlated! Lots of people who estimate that they'll do fine actually don't, and equally many people who estimate that they'll do badly, actually do fine (ie it looks like uniform random data), then that's surprising, and that's the D-K effect.

Right? I'm no statistician at all so I might be missing something.

mrkeen
0 replies
6h10m

If it's a statistical illusion, the correlation is still true, it just has no business being studied by psychologists.

If I roll a die, and then roll a second die, I might study the behaviour of the second die and wonder why it wants to add up to 7 with the first die. Since they're dice, I can dismiss that as a stupid idea, but if they were people, I could certainly be led astray by psychological theories about them.

zw123456
1 replies
13h43m

I know I'm not smart enough on statistics or psychology to evaluate the article but it always struck me that D&K seemed to say something similar to what my grandpa said when I was a wee lad, "The more you know, the more you realize how much you don't know", I know he wasn't the first person to say that, but he was the first person to say it to me. I don't know if D&K is autocorrelation or not, but I know that an awful lot of people seem to think they know more than maybe they actually do, probably me included. Hmmm, maybe the author of that article as well? I wonder if that occurred to him, seems like a glaring oversight not to at least recognize that possible irony.

Arch485
0 replies
13h9m

In the article, a real study was used as a counterexample to the DK effect.

Part of the results was a correlation that people who were "less capable" were also worse at predicting their own skill, and people who were "more capable" were better at predicting their own skill.

While similar to the DK effect, this is different, as the DK effect states that "less capable" individuals specifically _overestimate_ their skill, as opposed to simply being wrong (both over and under -estimating).

With relation to some people "seeming to think they know more than they actually know", this is likely confirmation bias in the sense that there are an equal number of people who don't know much, and know that they don't know much.

xanderlewis
1 replies
21h4m

Naïve take: I’ve always felt like Dunning-Kruger is just the result of the fact that when guessing the value of anything people tend towards some common mean, and so if the true value is low your guess tends to be high, and vice versa. This assumes nothing about what is being guessed, but does assume (perhaps wrongly) that there is a commonly believed mean value and that people tend to imagine they are close to it.

wavemode
0 replies
20h5m

That's essentially the plain-language interpretation of what the author of this article is pointing out - when you plot (actual score) against (difference between test score and actual score), you will always find a trend that underperformers overestimate and overperformers underestimate - for the exact reason you state.

thewanderer1983
1 replies
20h37m

The Dunning-Kruger effect isn't as the article first quotes. It's an effect that everyone experiences. We as humans tend to over simplify things we don't understand well or at all. Therefore we over estimate our expertise on these subjects. We also tend to under estimate how much an expert on subjects we do know well. Everyone does this. It's not just dumb people.

Jensson
0 replies
20h30m

We also tend to under estimate how much an expert on subjects we do know well

Any evidence for this, except Dunning-Kruger? To me it looks like everyone overestimates themselves. There are a lot of professionals who think they are undervalued and that people worse than them gets all the rewards and fame.

salty_biscuits
1 replies
20h26m

It's just correlation, why do they keep calling it autocorrelation.

stubish
0 replies
13h58m

auto correlation, or self correlation. A correlation between different things may indicate an actual relation (smoking is correlated with early mortality). A self correlation is a tautology.

pie_flavor
1 replies
21h18m

This take is a perfect example of Dunning-Kruger itself, ironically. https://andersource.dev/2022/04/19/dk-autocorrelation.html

dahart
0 replies
18h25m

How so? DK shows a positive correlation between confidence and competence.

mewpmewp2
1 replies
21h11m

My take on Dunning Kruger:

1. People really like the idea of smart people being humble and arrogance meaning stupidity, so they like to believe that DK is true, and they like to repeat this.

2. Some smart/skilled people are humble, some are arrogant.

3. Some smart/skilled people underestimate their skills, some overestimate.

4. Some stupid people are humble, some are arrogant.

5. Some stupid people underestimate their skills, some overestimate.

Overall, even if there is a correlation, you can't tell by just arrogance of a person whether we are dealing with DK or whether it's an effect at all. People's personalities, skills and everything are a bit more complex than that.

Overall bringing DK up seems like some sort of social justice/fairness effort rather than something that is actually true given any situation where someone is arrogant.

spacebacon
0 replies
20h42m

Maybe this shows how effective dumb people are at keeping smart people hammered down with thought stopping arguments.

mattbit
1 replies
8h1m

This is not ‘autocorrelation’, it is regression to the mean. I find the article unclear and imprecise. For those interested in a better overview of the Dunning–Kruger effect, I recommend this short article by McIntosh & Della Sala instead:

https://www.bps.org.uk/psychologist/persistent-irony-dunning...

mattbit
0 replies
7h55m

This is how McIntosh & Della Sala put it:

in the academic literature, it has been suggested that the signature pattern of the DKE (Figure 1A) might be nothing more than a statistical artefact. In a typical study, people’s tendencies to under- or overestimation are analysed as a function of their ability for the task. This involves a ‘double dipping’ into the data because the task performance score is used once to rank people for ability, and then again to determine whether the self-estimate is an under- or over-estimate. This dubious double-dipping makes the analysis prone to a slippery statistical phenomenon called ‘regression to the mean’.
ezekiel68
1 replies
19h26m

However, there is a delightful irony to the circumstances of their blunder.

Indeed. And I find the tendency of people in this comment section to defend the flawed theory is further confirmation of another scientific finding: that we decide based on emotion and then justify our decision using rationality.

stubish
0 replies
14h10m

Even when the article cites the 3 papers it is based on, no refutations of the published science by people who grok it.

eterevsky
1 replies
8h57m

I think this article would've made more sense if it had a title "The Dunning-Kruger effect is regression toward the mean", because that's what the author is actually showing.

tgv
0 replies
8h7m

I think your description is the most apt.

OP's own analysis shows that using random data (two variables uniformly distributed over the same range!) for both skill and self-assessment results in a different graph. The original comparison therefor implies another effect on the second dimension, which could be interpreted as: people don't estimate their skills correctly, but drift towards the mean.

But then the question becomes: what did they really ask their subjects? To pick the percentile or a true test score?

beltsazar
1 replies
20h28m

I don't know if I agree that it's an autocorrelation, but one way to explain The Dunning-Krugger Effect is by acknowledging this simple fact:

Most people think that they are an average person, but they can't be all average—there must be some people substantially below the median. Therefore, those people must overestimate their abilities.

This also applies to other aspects, such as attractiveness. Less attractive people would overestimate their attractiveness.

anonymouskimmer
0 replies
19h58m

For all of the tests and rebuttals of the Dunning-Kruger effect the people tested are not drawing from the totality of other people, but trying to compare themselves solely to those who also took the same test.

Anyone in a position to take such a test is almost guaranteed to be above average compared to the general population (which includes babies for intellectual tests, or the extremely old for attractiveness tests).

I think this complicates personal evaluation.

RevEng
1 replies
14h41m

What Blair Fix's article gets wrong is that there are two stark differences between what Fix generated with random data and what Dunning and Kruger observed in theirs.

Fix has each person guess randomly between 0 and 99 where they will lie in the percentiles. They simulate every person having no idea and giving equal probability to being the best or the worst. If we then sort them by how well they really did into quartiles and then evaluate the average of how well they thought they would do, we get what we would expect: each quartile has an equal chance of predicting that they will do well or do poorly, with an average expected percentile of 50, which is what you would expect by a random guess.

Note two key things about this: - All quartiles guessed the same - there was no correlation between what they guessed and how well they actually did - All quartiles guessed the expected average percentile - 50%. This means they were unbiased in how well they thought they would do.

If people were unbiased but also unaware, this is the null hypothesis we would expect: on average people predict themselves to be average and there's no correlation between how well they predicted they would do and how well they actually did.

Now compare that to what Dunning and Kruger observed: - The quartiles did NOT guess the same. There was a bit of an upwards trend, which suggests that people at least somewhat were able to determine their actual percentiles, even if only weakly on average. - The predictions were biased. All groups estimated they would do better than the expected average. That is to say, on average, they thought they were above average. This is an important bias. - The differentials between quartiles are not equal. The first and second quartile typically predicted the same, over-estimated value, implying that neither group had any idea they were better or worse than each other. However, the upper quartile consistently estimates a higher average. That is to say, people who perform well, on average, believe they are performing even better than those who don't perform well. And perhaps most surprisingly, there was often a statistically significant dip at the third quantile. Comparing their beliefs, people who did well believed they had done worse than the people who actually did worse.

Fix also fails to go beyond the first figure of the paper. After seeing this inconsistent behaviour between the quartiles, Dunning and Kruger then test what happens if the respondents are given an opportunity to grade each other - therefore getting an idea of what the percentiles actually look like - and to have their skills improved - thereby possibly making them better able to judge their own and each other's abilities. Again, if Fix's premise that this is all just a result of manipulating the autocorrelation of an otherwise unbiased random sequence, then these interventions should have no discernable effect. Yet, Dunning and Kruger find markedly significant changes after these interventions, and those changes are different within the different quantiles.

It is precisely this difference between quantiles which is the Dunning-Kruger effect. Fix effectively makes their point for them by building a null model and showing what would happen if there were no Dunning-Kruger effect - if people were fully unaware and unbiased. Instead, it is the way in which Dunning and Kruger's observations deviate from this model that is the very effect that bears their name.

Instead, all that Fix manages to do is point out how confusing the plot is that Dunning and Kruger produced. The plot can easily be misinterpreted to suggest that it's the difference between y and y-x that is important. Instead, in their writing, Dunning and Kruger actually focus on the differences in how y-x changes when the situation changes, demonstrating that it's actually dependent on knowledge and how different people respond to that knowledge. What they actually show is that delta(y-x) vs x has a nonzero relationship and this is particularly interesting.

Perhaps if Dunning and Kruger had not included the example of perfect knowledge as a comparison, but instead included the example of unbiased and unknowledgeable that Fix produced as the thing to compare against, the Dunning-Kruger effect would be much better understood.

Further, both could benefit greatly from plotting and tabulating not just an average, but the overall distribution within each group. Fix should know that variance is just as important as bias. Even if all groups are biased in their prediction, differences in variance between each group indicates their confidence in their belief. Knowledge should help to reduce both bias and variance. A guess with high variance tells us little, while a guess with low variance tells us quite a bit. Even if all quartiles predicted the same average, we wouldn't fault those with little ability for guessing a high number if they did so with low confidence. On the contrary, we would expect people with high ability to be more confident (and correct) in the assessment of their ability.

hgomersall
0 replies
6h11m

The entire post is pointing out how bad the stats is in the original paper. If you want additional critique, go and read the references.

vismwasm
0 replies
20h31m

The author measures the Dunning Kruger effect on his random data exactly because he assumes it when generating his random data.

By modelling skill and perceived skill as uniform draws between 0 and 100, the unskilled (e.g. skill=0) will over-estimate their skills (estimated skill = 50, the mean on the uniform random variable) and the skilled (e.g. skill=100) will underestimate it (as 50 as well, again the mean of the same random variable). The only ones who will be correct (on average) are the average skilled ones (skill=50).

toasted-subs
0 replies
20h3m

Idk I genuinely feel like after having to deal with 10+ doctors who all had different opinions. The last doctor finally made the same conclusion as me and he was the last person I had to see.

There's always exceptions. And sometimes reading publications pertaining to a very specific thing should give you more say on a subject.

I just feel bad American tax payer money and the best years of my life was spent on telling medical professionals they don't know what they are talking about.

rom1v
0 replies
6h26m

If Y = X + estimation_error, then substracting X (in Y-X) removes the correlation rather than adding it.

riazrizvi
0 replies
18h4m

A general problem with Dunning Kruger is the assumption that if you score low on a test then you are bad at the subject it is evaluating. I’ve taken enough bad quizzes that purportedly evaluate skills that I am an expert in, to know that that is a leap.

psychoslave
0 replies
7h9m

I went through the whole article, and I am not only very skeptical about the claimed debunk but wonder what kind of psychological trope you might label as corelative to such an article.

I mean "bad science built only on rhetoric" is a double edged sword, you know.

To start with, the graph presented at the end does not look like the one from the original article, where the self assessment does grow significantly, though it starts higher than average and grows less quickly than external assessment.

Also the article focus on "random" data set which, but we know that there are different classes of apparent noisy plots. Noisy distribution of self assessment would actually be an informative figure too.

So the biggest issue here is its kind of pretending that whatever the way the ordinate value is coupled to, if it includes the abscissa in its definition you'll get the same kind of plot as a result, which is obviously false. You could easily come with arbitrary values coupled to "x" that would look radically different.

pmavrodiev
0 replies
7h11m

Noone seems to have read OP's post in its entirety. A crucial point was made by referencing this paper: https://digitalcommons.usf.edu/cgi/viewcontent.cgi?article=1....

Figure 2 in this paper shows the result of an experiment where skill and perception of one's skill are measured independently. To eliminate any statistical artifact of auto-correlation. And lo and behold - on average skill is uncorrelated to the accuracy one's own assessment. No DK effect at all. What does show up actually is that more qualified people are more consistent in estimating their skill (i.e. their assessments are less variable), but the mean accuracy is still 0.

So indeed, on average actual and perceived skills are uncorrelated. That's exactly what the numerical proof with random numbers shows and why in many cases we apply Occam's razor.

notShabu
0 replies
19h23m

every domain of expertise has two "elo" systems, the niche one and the broader one.

e.g. you can learn basic juggling in 30 minutes that you are top 10% of your friends/colleagues etc...

however within the juggling community itself this is known as the "3 ball cascade" a really simple trick relative to the ones that requires years to master. an outsider may not be able to tell the difference between the 1 year expert and the 10 year master.

a lot dunning-kruger can be explained by people in one or the other not understanding the other system

nitwit005
0 replies
17h36m

If self evaluations are random, and you group a bunch of them together, then you'll see values around the 50th percentile. That's why their self evaluation line is nearly flat.

In the actual data though, the line clearly trends upward. The people who did well appear to be scoring themselves non-randomly.

markhahn
0 replies
18h37m

the numeric experiment does not produce a line identical to what DK report. if DK's line where horizontal at 50%, it would indeed be nothing but autocorrelation.

lopatin
0 replies
19h20m

Oh I read about the about the DK effect a while ago. I'm pretty much an expert in Psychology now, AMA.

lifeisstillgood
0 replies
15h49m

The Dunning Kruger effect is simply the same reason expensive projects are undertaken and never hit budget - not because we cannot estimate costs but because if we did we would never do anything.

jongjong
0 replies
17h30m

This makes sense. IMO, the reason why Dunning-Kruger effect is so popular among the upper classes (along with Impostor Syndrome) is that it helps to provide justification for social inequalities as it corrects inner monologues.

"How come I have so much given that I'm not as skilled as these other people? I must suffer from impostor syndrome."

"Look at all these people complaining instead of taking responsibility for their own failures, they probably suffer from Dunning-Kruger effect. Their work must not be good enough."

But of course this requires a certain detachment from reality (hence why many upper class people have blind spots). If they actually took a look at the evidence, they may find that some of these 'Dunning-Kruger people' are actually far more skilled than they imagine. I think it explains why people like Jürgen Schmidhuber who made significant contributions to AI tend to be ignored. Then because people are ignoring them, they are compelled to promote themselves harder to try to get their fair share of attention but they are then put in the 'Dunning-Kruger basket' until someone with a very good reputation like Elon Musk comes along and gives them credit. I think the same could be said about the mathematician Srinivasa Ramanujan; many mathematicians ignored his work or assumed he was a fraud because he seemed too sure of himself for someone who was completely unknown at the time. If such gross injustice can happen in a perfectly-quantifiable field like math, you can be sure it can happen in any field.

joefourier
0 replies
21h4m

So from my understanding, the Dunning-Kruger Effect paper doesn’t show the distribution of the perceived test scores nor the standard deviation, only an average, which rises with actual test score level.

If they showed the spread bar in each bin, you could form very different conclusions. Do low skilled people consistently estimate their score at around 60, or do they give effectively random results centred around 60?

Assuming the latter, it could mean that low skilled individuals are completely unable to evaluate their performance while higher skilled people are slightly better at it but still not very good, giving a slightly positive correlation which… is very distinct from what the DK effect implied.

im3w1l
0 replies
20h4m

It's fascinating how great Elo and similar ranking systems are at curbing DK. You just get a number, and that's how good (bad) you are. It's incredibly precise too, there's just no arguing with it.

Also since the topic is D-K I'm a bit scared that I'm the fool here, but isn't he misusing the term autocorrelation? What he describes sounds like just normal correlation?

hyperthesis
0 replies
14h19m

It's Dunning-Krugers all the way down - including this self-referential smugness.

hyperthesis
0 replies
15h9m

If unskilled and skilled self-assessed themselves the same on average, then unskilled overestimate, and skilled underestimate.

That would be a significant result alone - that no one had any idea. (but as https://news.ycombinator.com/item?id=38416100 notes, there is a correlation).

hn_throwaway_99
0 replies
20h51m
golol
0 replies
16h2m

I disagree. Dunning Kruger is not a statement about predicted score correlating with actual score in some way. It states that predicted score does not correlate well with actual score. This can be rephrased as the prediction error having a negative correlation with the actual score. The article then claims that this negative correlation is autocorrelation. That is true but the correlation still exist. The thing is that ideally we EXPECT there to be no correlation of the prediction error with the actual score, but we find autocorrelation. Going back to variables where this autocorrelation is not there, we EXPECTED to find a 1:1 positive correlation between predicted score and actual score but find no correlation, or a weak correlation.

So finding autocorrelation when you expected to find no correlation is pretty much the Dunning-Kruger effect here.

In fact their example with the random data totally makes sense: Suppose people uniformly randomly estimate their performance. Then the people who are low skilled will consistently over-estimate and the people who are high-skilled will consistently underestimate. Of course there is no causation here, as the people choose randomly, but there is an undeniable correlation. I guess the question is if you view the Dunning-Kruger effect as a claim to low skill CAUSING positive prediction error, or just correlating with it.

glitchc
0 replies
21h24m

Geez, this is eye-opening. Thank you for sharing this.

fnord77
0 replies
17h23m

wikipedia's article intro on this doesn't state it is invalid :/

epigramx
0 replies
14h7m

"Autocorrelation is the statistical equivalent of stating that 5 = 5." no sure if the author has some ..dunning-kruger there.

eagerpace
0 replies
18h38m

Is this the opposite of imposter syndrome?

dudeinjapan
0 replies
1h3m

So you're saying that the Dunning-Kruger effect applies to Dunning & Kruger.

dimask
0 replies
12h44m

I would call this type of argument a case of regression to the mean rather than "autocorrelation". That, of course, in principle requires independence between performance and assessment of performance. In many cases, it would make little sense to assume that the performance and assessment of performance are independent. But even then, one can simulate random data with some correlation, and still get a DK effect merely as statistical artifact. An overview of similar critiques, and a similar argument in https://www.frontiersin.org/articles/10.3389/fpsyg.2022.8401... .

dilawar
0 replies
9h58m
dclowd9901
0 replies
19h57m

I think what this article is missing is “the chart DK should have used.”

Instead we get a spurious explanation that doesn’t make a lot of sense based on completely fabricated data. It’s entirely natural for something that looks like DK to emerge from randomized data, especially when the Y axis is represented by some number of the mean (actually 50ish in this case).

dahart
0 replies
18h28m

Most people, even here on HN, do not know what the DK effect actually claimed to show. It does not show that confident people are more likely to be incompetent. Their primary result shows a positive correlation between confidence and supposed skill. (What skill, you ask?*)

This article suggests DK is even simpler than autocorrelation, that it’s just regression toward the mean. https://www.talyarkoni.org/blog/2010/07/07/what-the-dunning-...

I don’t know which statistical artifact it is, but I am quite convinced that the so-called DK effect is not demonstrating something interesting about human psychology, I don’t buy that this is a real cognitive bias. I’ve read the paper several times, and the methodology seems to be lacking rigor. They tested a small handful of Cornell undergrads volunteering for extra credit, not a large sample, not the general population, and tested nobody who actually fits the description of ‘incompetent’ in a meaningful way. They primarily measured how people rank each other, not what their absolute skill was - and ranking each other requires speculating on the skills of others. There are obvious bias problems with asking a group of pampered Ivy League kids how well they think they rank.

* One of the four “skills” they measured was ability to get a joke - “appreciation of humor” - Huh? This is subjective! The jokes used aren’t given in the paper, either. Another was ‘grammar’ tests.

concordDance
0 replies
20h10m

The author fails to make his point quite badly. Of course if everyone's self assessment was random the bottom quartile would overrate themselves! And that would be half of the Dunning-Kruger effect and we could truthfully say "the bottom quartile of people overrate themselves"!

The other part where those at the top have a better idea or where they rank noticeably does not come out in his toy example.

Honestly, he comes across as not having the slightest understanding of how people interpet those graphs...

civilized
0 replies
13h44m

We discussed this in a previous thread. The author is basically hypothesizing that perhaps people are so universally terrible at predicting their ability, their self-rating is like an unconditional random variable - just a random draw that is not influenced by their actual ability level at all.

If this is true, then when your actual ability is high, your self-rating is likely to be lower than your ability simply by random chance. For example, if ability ranges from 0-100, your actual ability is 99, and your self-rating is a uniform random number from 0-100, your self-rating is 99% likely to be lower than your actual ability. Conversely, if your actual ability is low, your self-rating is likely to exceed your actual ability level.

When it's explained clearly and simply, the criticism raises a lot of questions. Are people actually that bad at rating their own ability? I doubt it.

chmod600
0 replies
15h7m

A related effect that I've wondered about is: perhaps lower-skilled people compare themselves to the general public, while perhaps skilled people compare themselves to a smaller group of skilled peers.

In other words, if you asked me if I'm good at riding a bicycle, I'd compare myself to others in the general population and say "yes". But if you ask a weekend bicyclist, they'd be better than me but perhaps compare themselves to weekend bicyclists, and rate themselves lower. And the effect might repeat for competitive bicyclists.

If true, this could explain why we intuitively believe the DK effect.

chiefalchemist
0 replies
20h39m

DK for me is simply: "You don't know what you don't know." When that happens, it's easy - surprise, surprise! - to misjudge your skill level. In a way, it almost feels cruel to ask someone with too few points of reference to say how much they know. The fact is whether high, low, or in the middle...they are guessing.

On the other hand, with enough experience the depth and breadth of your context improves, as it should. At that point, mis-self-assessment is the result of arrogance, bravado, etc. That's a different problem than simply not knowing.

If nothing else, DK has a case of apple v oranges.

bsza
0 replies
5h26m

Article claims Dunning-Kruger is present in a population where everyone estimates their own skills based on dice rolls. Someone who estimates their own skills based on a dice roll is objectively crap at estimating their own skills. Dunning-Kruger claims people are objectively crap at estimating their own skills.

Where is the contradiction?

badrabbit
0 replies
17h1m

In my experience, people abuse flattery too much so it is hard to tell if their positive opinions of me are genuine and with merit. Generally speaking, I try to see the big picture and realize no matter how well I do, in a more global sense at best I am too 50th percentile, slightly above average. It is chance,relationships and supply/demand economics that ultimately decide our ability to apply our talents effectively.

When it comes to others, I wish more people experienced the D&R effect. It gets frustrating sometimes dealing with smart and talented people who think they are revolutionary rockstars. You know the kind, they see other people's work and they are shocked how bad everything is, but never fear, they, our heroes are here to refactor everything until they leave and another hero looks at their work and rescues metropolis from it again. Patience and humility are a rare virtue for all of us.

austin-cheney
0 replies
18h19m

The best way to differentiate DK from autocorrection is motive. Low performance people will focus on motives that reinforce the perception of their competence, for example preferring code style over code delivery because while both may be arguably important one requires less effort and risk to attain.

There is research to qualify this out of Stanford. People will shift motives to attain complements and the types of compliments received will dictate the challenges they are willing to accept. When a compliment is specific to an action and measurable people will strive for continuously more challenging tasks to continually receive specific compliments. When compliments are generic and directed to the person they will tend to preference progressively less challenging tasks so that they continue to shine relative to the attempted effort. The differences in behavior produces a natural Dunning-Kruger effect wherein people seeking less qualified activities are more likely to over estimate their potential and degree of success.

This also statistically verified in research that correlates predictions to confidence. The more confidence a person is in their predictions, such as political talk radio hosts, the less accurate their predictions tend to be.

abnry
0 replies
20h15m

If there is a linear relationship between test score (X, ability) and test score self-assessment (Y, self-perception), then the random variables are modeled as:

$$ Y \sim aX+b+N $$

Where N is some statistically independent noise, mean zero.

This means the covariance between them is

$$ Cov(Y-X,X) = E[ ((a-1)X+b+N -(a-1)E[X]-b) (X - E[X]) ] $$

Which is

$$ Cov(Y-X,X) = E[(a-1)(X-E[X])(X-E[X])] + E[N(X-E[X])]= (a-1) Var[X] $$

To get a "DK effect" we need (a-1) < 0, or a < 1. If a=0, in the case of the blog post, then this is absolutely true. If a=1 (which, along with b=0, is the ideal scenario), then this is barely not true. If a > 1, then we'd have a whole new effect about arrogant experts.

So the only thing that matters from this "auto-correlation perspective" is the rate at which an individual's self-assessment increases with their ability. As long as they underestimate the increase, a "DK effect" will occur.

However, in the above analysis, we ignored the variable b. If a = 0.8 and b=0, we'd never have the so-called "DK effect" even though it matches the "auto-correlation perspective" because everyone would underestimate their ability.

This tells me that the value of b matters. It is sort of like the prior ability everyone assumes they have. What the DK papers shows is that b > .5, which I think is in line with the spirit of the popular interpretation of the "DK effect". People should not be assuming they have, at a minimum, a capacity higher than the average.

At the same time, the value b isn't insanely higher than .5, which also makes me want to cut those unskilled and unaware some slack. It "seems reasonable" to assume your baseline is average. That can't be the case, but it feels intuitive.

a-dub
0 replies
19h47m

i think of acf as a measure of repeating temporal structure and how "strong" and "long" it is, if it exists.

that is, it gives you a notion of if and what order of an ar model should fit any repeating structure in the data.

TrackerFF
0 replies
18h25m

The DK effect has gotten WAY more cred than it should. Today, it is just anoter feel-good piece that people use to justify their feeling that they're (ironically) surrounded by loud idiots.

Spiwux
0 replies
6h4m

At the risk of sounding like a complete idiot, isn't the hypothesis of the original paper still true? Let's assume self assessment score is perfectly random between 0% and 100%, so on average every group will always estimate themselves to be 50% correct

Then by definition that means people who are unskilled and often incorrect will overestimate themselves, while people who are often correct will underestimate themselves. Take a complete idiot for example. You always get 0% test score. Yet your self-assessment is random between 0% and 100%. Hence you overestimate yourself much more often than people who always get 100% test score.

In fact, if the two are uncorrelated, then that still means that

1) Idiots don't recognize they're idiots

2) Skilled people don't recognize they're skilled

PeterStuer
0 replies
9h33m

You can take out the x from both sides, and the y would still not be a horizontal line.

In their eagerness to 'deconstruct' the narrative, do the authors merely provide another example of Dunning-Kuger by overestimating their own cleverness?

Jensson
0 replies
21h34m

Psychologists using their pet theories to explain results and then people taking that explanation as the truth when they should really just look at the data is probably an as large problem as the replication crisis.

James_K
0 replies
18h14m

I think the issue here is a confusion about what "bias" means. If they are self-assessing at random, then the high performers will all underestimate themselves, but this is not a bias towards underestimation as they are choosing randomly.

That said, the chart from D-K seems to show a different bias and line up roughly with what you would expect. Someone with no knowledge assumes they are average skill and hence inflates their position, someone who is very good doesn't want to rate themselves the best because they assume others know as much as they do. The assumption underlying both groups is that you are normal and others are similar to you.

I hypothesise that most people think they're average, which is something you could easily test by asking them to rate how well they think the average person would do on a test and comparing it to that individual's test score. I'm almost certain that high performers will overestimate the average, and low performers underestimate it.

CalChris
0 replies
15h39m

The article's definition of autocorrelation:

  Autocorrelation occurs when you correlate a variable with itself. 
Wikipedia's definition of autocorrelation:

  Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay.
Of course, 0 delay is the trivial case of time delay but really, the article's definition is at best inaccurate. D-K has nothing to do with time delay and calling it autocorrelation seems like a weird pun that doesn't quite land.

BrenBarn
0 replies
9h55m

Yeah I don't buy this either.

I do think the original Dunning-Kruger plot is a bit of an odd presentation. The way I look at it is just to say that people's self-estimates of their ability fall into a relatively narrow range (e.g., 55-75th percentile on the graph), whereas their actual abilities of course cover the whole range from 0-100th percentile. You don't really need the plot of "x versus x" (average score in each quartile). You just need to say "people's self-assessments seem to start unrealistically high and only go up a little, even as their ability goes up a lot".

6510
0 replies
13h46m

I was curious if the self assessment is done before or after the test.

Bing chat gave me this wild answer:

The effect is usually measured by comparing self-assessment with objective performance. For example, participants may take a quiz and estimate their performance afterward, which is then compared to their actual results 1. Therefore, people estimate their ability before the test by Dunning-Kruger.

In the case estimation is done before: If you've had training, like a soup of ingredients, that matches the priorities and biases of the test it would be strange if no measurable effect remained.

If it's done after: You can create trick questions specifically designed to test if someone learned a specific thing. A good test would test for that. If someone didn't learn the specific thing they could give/guess the wrong answer with some confidence.

The design of the test has great influence on how poorly you'll think you've done. I would argue that the superior test is the one designed to fool you. Hans Rosling famously created a multiple choice test with 4 answers per question with average results below 25%.

On a more fascinating note, unskilled means all areas of expertise outside your own.

People who are universally unskilled in all areas are of course more likely to think they are unskilled. In reality these people know little bits about many things.

This in contrast with people who spend all day, every day, for their entire lives pondering topics inside their area of expertise. If you are doing one thing you aren't doing all of the other things.

Wikipedia had hilarious instances of experts contributing to countless articles accidentally ending up on the wrong page. Suddenly they have no patience, think they know everything and act like children. It's funny because you cant just ban valuable contributors.

I would love to see this DK test done with professors furthest removed from the area of expertise.

19f191ty
0 replies
12h37m

That is not an autocorrelation. The OP is equating linear dependence with autocorrelation, which not how we use that term. Autocorrelation is when a random process is correlated with time lagged version of itself.