return to table of content

Friends don't let friends make bad graphs

crazygringo
16 replies
16h28m

On the one hand, this all seems pretty great.

On the other hand, I think a lot of these "bad graphs" are very intentionally chosen precisely in order to hide the small number of data points, or an underlying distribution that looks suspicious, etc.

So it's not so much "friends don't let friends", but more "when you see a graph that chooses to obfuscate rather than clarify, suspect it might be intentional".

earthscienceman
6 replies
16h11m

Not that you're wrong but researchers are also deeply imperfect. They're rushed, they're given no time to actually improve their work, and the emphasis is entirely on 'good-enough' publications. The number of times I've been involved in a paper where the mentality wasn't "get it out the door, now" is.... zero times.

Plots often fail to clarify for the precise reason that clarification takes time and effort and those things are lacking in academia in spades. Are people intentionally hiding ugly details, definitely on occasion? But I don't think it's the primary source of such bad figures.

defrost
5 replies
15h59m

All that you said with a heavy dash of good data visualisation is more of a skill and an artform than many people realise.

I've had four decades of crunching numbers in a variety of Engineering, Geophysics, and science applications with a hefty amount of public consulting on a variety of applications and of the large population of those good at gathering and recording data perhaps only 20% had that extra talent for good visualisation to convey meaning without distortion.

ctxc
4 replies
12h37m

Are there resources or good examples you would recommend?

yummypaint
0 replies
2h54m

There is a book called. "how to lie with statistics" by Huff that should probably be required reading for everyone. It's not very technical and a pretty quick read

mif
0 replies
9h15m
frederikb
0 replies
9h14m

I really enjoyed Storytelling with Data: A Data Visualization Guide for Business Professionals by Cole Nussbaumer Knaflic. The author really breaks down the individual elements of good/bad visualizations using case studies with lots of actionable advice. Highly recommended.

armagon
0 replies
12h9m

I liked "Signal: Understanding What Matters in a World of Noise" by Stephen Few.

taspeotis
5 replies
16h2m

Yeah both AMD and NVIDIA churn out some pretty shit graphs year on year. But! Intentionally!

spartanatreyu
4 replies
15h55m

Don't forget about Apple

rswskg
2 replies
7h31m

Got some examples?

ryukoposting
0 replies
3h52m
mbreese
0 replies
4h44m

Any of their performance graphs… none of them include axis labels, so you never really know what they are comparing. All you can see is that the newer chip is “faster” or “more efficient” than something else. You just never know by how much.

shepherdjerred
0 replies
9h8m

I would consider myself an Apple apologist, but I can't defend their graphs. They're truly unforgivable.

llamaInSouth
1 replies
12h17m

lol, no... most people suck at laying out graphs... most cant even label axis

AmINotARobot
0 replies
10h4m

I for the life of me can not produce good graphs in Matlab... I use it so rarely that I forget all the syntax and when to us hold etc. So now I just export the data and plot it with python to annoyance of my pms...

tgv
0 replies
8h8m

It definitely happens intentionally and unintentionally in psych literature, certainly the first one. Psych articles report p values over ANOVAs, and they blithely assume everything is normally distributed. For one group, there's absolutely no need to poke the sleeping dog. The other group simply has no idea. They shouldn't be doing research, but their training is lacking, and cheap PhDs aren't that easy to come by, so here we are.

ppqqrr
14 replies
12h42m

A while ago i had a heated discussion on HN with someone who claimed that any graph where 0 is not the minimum value of all the axes is misleading.

We were talking about a graph that shows global temperature rise due to climate change. They claimed the graph was misleading because the Y axis (temperature) didn't start from 0 (fahrenheit? celsius? fucking kelvin?).

This person also quipped, "maybe if you can't see if with 0 at the bottom, it's not such a significant change?". That put a dent in my faith in humanity for a while. I'm just glad to see us operating at a higher level. I guess 2016-2020 was a different time.

foldr
4 replies
6h48m

fucking kelvin?

I mean yes, if you want the ratios of different temperatures to be meaningful, then that's where you'd need to set the zero point. You could argue that a graph that makes 25C look "25% hotter" than 20C is misleading in this sense. (Not that this justifies global warming denialism.)

kergonath
3 replies
4h55m

Significant changes are not necessarily visible on a scale from 0 K to 400 K. I mean, if you show up at the hospital with a temperature of 315 K instead of your base line around 310 K, that’s fucking significant even though you would not see anything on a scale that starts at 0 K.

You could argue that a graph that makes 25C look "25% hotter" than 20C is misleading in this sense.

That’s meaningless in any sense. The origin of the Celsius scale is arbitrary, “25% hotter” has no meaning whatsoever.

foldr
2 replies
4h26m

The origin of the Celsius scale is arbitrary,

True – but the origin of the Kelvin scale is not.

“25% hotter” has no meaning whatsoever.

It absolutely does have a physical meaning. It means that the system has 25% more energy at the microscopic level. (Or, you know, substitute in a more precise physical definition of temperature – it will be some kind of measure of energy, even if it's not exactly that.)

It's not necessarily wrong to suppress the zero in a graph of temperature changes, but by doing so you are making bars in the graph proportionally larger or smaller relative to other bars by an arbitrary amount. That could potentially be misleading, depending on what point you are making using the graph.

kergonath
1 replies
3h26m

It absolutely does have a physical meaning.

No, not really. Heat is a poorly defined concept to which we are saddled for historical reasons. For example:

It means that the system has 25% more energy at the microscopic level.

It does not. This definition only works in a frame of reference at rest compared to the thing you are observing. Imagine a piece of matter that is travelling at a velocity v in your implicit frame of reference. Its temperature does not depend on v, even though it’s kinetic energy does. We are back to the choice of scale.

And then there are negative absolute temperatures, which do not make any sense at all if heat is kinetic energy.

The actual definition of thermodynamic temperature is the inverse of the derivative of the energy with respect to the entropy. This is highly non-intuitive and we cannot extrapolate our intuitive concept of heat too much.

It's not necessarily wrong to suppress the zero in a graph of temperature changes, but by doing so you are making bars in the graph proportionally larger or smaller relative to other bars by an arbitrary amount.

Right. This is the point that was made in the story and I entirely agree with that. A bar chart communicates a surface area. Changing the scale artificially changes the surface area and is misleading. The logical conclusion is that bar graph make no sense for temperatures, or to show the relative change of a variable.

Personally I would go further and say that bar graphs are inappropriate in the vast majority of cases, but that’s just my opinion.

That could potentially be misleading, depending on what point you are making using the graph.

Indeed.

foldr
0 replies
2h52m

If we're talking about global warming (as ppqqrr was), then it's surely some kind of objective physical notion of 'getting hotter' that we're interested in. The problem with global warming is not that we all feel subjectively hotter!

And I did say:

substitute in a more precise physical definition of temperature
ctxc
2 replies
12h38m

You should hand him a chart with "years" on the x axis with it starting from 0AD. :P

SiempreViernes
0 replies
6h12m

The only sensible start for any time axis is clearly Modified Julian Day 0 which puts the x=0 of any truly god fearing years axis in November 1858, as it should!

Alternatively Fermi Mission Elapsed Time is also an acceptably cool zero point, which puts the zero in January 2001. The zero of the unix time is tolerable only in truly desperate circumstances.

Moru
0 replies
9h54m

Why some random starting point like when a religion started counting? Ofcourse you have to start from the beginning of the universe. THAT will put things in perspective!

Beijinger
2 replies
5h59m

"that any graph where 0 is not the minimum value of all the axes is misleading."

I partly agree with him.

Take this example, first graph I could find:

https://religionnews.com/wp-content/uploads/2014/08/61Years-...

For me it looks on a first look like a 2/3 decline and this is misleading. Often this decling graphs give an optical picture that does not reflect real decline.

kergonath
1 replies
5h3m

I partly agree with him.

It is not helpful in general. Magnitude and relative changes are different things. Sometimes you need one, and sometimes you need the other.

Global average temperatures are a good example: where is the zero? Is it significant? The effects of an increase or decrease of 0.5°C are massive, so the appropriate way of presenting this is to show the temperature anomaly, not the absolute temperature. Also, this way the information is conveyed regardless of the temperature scale in use.

The religiosity graph is interesting. If you want to show a sudden change at some point, then showing the relative change is appropriate. If you want to show that people are not religious anymore using this graph, then you are dishonest. It is all about the narrative and the point you want to make.

On its face, “scales must go to zero” is not good advice, because you can always change the variable so you can make anything go to zero without changing the shape of the curve and our perception. However, when we see a graph, then we always need to understand why it goes to zero or not, what the author is trying to show, and whether they are being honest about it

Beijinger
0 replies
3h9m

"global average temperatures are a good example: where is the zero?"

I don't know.

"The average surface temperature on Earth is approximately 59 degrees Fahrenheit (15 degrees Celsius), according to NASA"

But I know if the average temperature increased 0.5 degree C and I show a graph with the scale 14 to 16 degrees over time and the headline "world average surface temperature exploding" then this is excellent clickbait and a nice graph, but it is misleading on the first look.

bobbylarrybobby
1 replies
12h29m

Easy solution: just plot the delta in °C since some fixed date. (Or for any graph, just subtract y(x_0) from every point. Tada!)

lelanthran
0 replies
11h14m

To add, that's actually a Bett chart anyway, because you aren't showing the temperature over time, you want to show the change in temperature over time.

Completely different things.

globular-toast
0 replies
8h47m

Classic example of someone taking away an overly simplified rule from a problem they don't fully understand. A little knowledge is a dangerous thing.

jyunwai
7 replies
14h57m

A great reference for further reading about data visualization is "The Visual Display of Quantitative Information" by Edward Tufte, a classic book originally published in 1983 with enduring relevance today.

quietbritishjim
3 replies
11h10m

It's an interesting book with some really interesting examples (good and bad). I also recommend looking at it.

But the central premise the text presents is "maximise the information to ink ratio", which sounds very reasonable but is fundamentally flawed. The problem is that quantity of ink (or, on a screen, number of black pixels) is not the same as visual complexity. By the time your brain is interpreting something visual, it has done edge detection, grouping, and other preprocessing.

He gives an example of shortening the axes on a scatter plot to show the range of data, rather than intersecting at the corner. This is win-win because it uses less ink but shows more information. But, when I look at the comparison, it's just obvious that the modified version is visually more complex. It would be especially worse on a complex page with text and a few plots next to each other – the fragments would all visually bleed together.

One mitigation to that sort of complexity is to put boxes around large district elements, like an entire plot. But boxes are Tufte's absolute nemesis, in that book and elsewhere. It's surprising that after so many years looking at visual displays he still has that attitude.

3abiton
2 replies
10h23m

Do you have alternative book recommendations?

quietbritishjim
1 replies
8h50m

Afraid not. I only found out about Tufte's book due to a mention in xkcd. Until Randall mentions another book about plots I'm not likely to find one :-)

It would be nice to find someone like Tufte that gathered some actual evidence. Like, trying different plots on groups of people to see which find information faster, maybe using eye tracking. Even just subjective surveys might be an improvement on one person's opinion.

stiff
0 replies
8h16m

William Cleveland did some research along those lines and he wrote two good books:

    - The elements of graphing data

    - Visualizing data
The first one is of general interest, the second one more specialized for statisticians.

rnburn
0 replies
14h33m

That's a good book. I'd also recommend John Tukey's classic paper "Some Graphic and Semigraphic Displays", https://www.edwardtufte.com/tufte/tukey

Tukey was one of Tufte's mentors.

kergonath
0 replies
4h24m

It’s a solid advice, but bear in mind that neurosciences and our understanding of perception has improved since then.

It’s very interesting and relevant, just not the final word on the subject.

contrarian1234
0 replies
11h6m

I think he has an insightful framework for thinking about what makes good data visualization

However I feel a lot of people miss the logic behind it all and just straight up copy the "Tuft style" which often is too stylized and iconoclastic. A good example are the plotting defaults in R's ggplot2

seanhunter
5 replies
12h1m

The example for "Friends don't let friends make heatmaps without maxing out outliers"[1] is so common. It's also very frequent in stats visualisations in videogames. If you play strategy or simulation games they often have visualisations to help the player understand what's going on/wrong, but for heatmaps because of the effect of outliers the heatmap gradient is often pretty useless. For example in the game oxygen not included, frequently if you do the temperature viz, everything just turns to either blue or some shade of pinky-red, because if you have a volcano or other heat source all the other colours seem cool. So you can't distinguish between a 1000C volcano and your slightly overheating 270C steam room for instance, they'll just be a pretty uniform shade of pinky red, and your overheating 60C base will be blue because it's pretty cold relative to them. Meaning that heatmap is pretty much useless for diagnosing a bunch of temperature problems.

[1] Can't remember the exact wording but that one it's like number 6 or 7.

nichevo
1 replies
7h14m

Wouldn't log-transforming the data be a suitable solution here? Unless monitoring the temperature of the volcano is outside interest, in which case excluding it or marking it as an outlier indeed would be apropriate.

seanhunter
0 replies
7h10m

Oh sure, there's lots of ways you can do it. I'm just saying that people making viz for games often do it in a way that isn't actually helpful, and that's an example.

mcv
1 replies
8h17m

If specific temperatures represent a meaningful problem, your colours should be standardised to show those temperatures, and not be automatically generated out of the whole spectrum of temperatures you currently have. In fact, having the meaning of colours change depending on temperature changes sounds like a pretty bad idea.

kergonath
0 replies
5h10m

More nuances in the spectrum sounds like a good idea, however.

faceplanted
0 replies
2h0m

A friend of mine has a literal heapmap, i.e. an infrared thermal camera, and that actually displays obvious outliers in striped red and white like digital camcorders show overexposed areas. It's really useful because it works both as an alert for hotspots, and also as a reminder to ignore them.

jrauser
4 replies
15h1m

I wrote a talk entitled How Humans See Data that puts several of these ideas, among others, into a coherent framework based on research by Bill Cleveland.

https://www.youtube.com/watch?v=fSgEeI2Xpdc

jsweojtj
0 replies
11h37m

I've recommended this exact talk many times! It's excellent.

jbay808
0 replies
11h57m

Great talk and very well presented.

andrei_says_
0 replies
14h42m

Thank you for this

alexpetralia
0 replies
14h52m

I'm very excited to watch this!

I've seen your other talks, which were all fantastic, and highly recommend them others.

airstrike
4 replies
15h41m

> 3. Friends Don't Let Friends Use Bidirectional Color Scales for Unidirectional Data

why use colors at all in those examples?

slhck
0 replies
9h1m

Precisely this! There's no need to code the same information twice; it just means you have to think more about what the graph shows.

kergonath
0 replies
4h8m

In the examples, the colour scale improves nothing. The advice is still good in general. A single variable representation can be improved by using a colour scale compared to just gray levels in a lot of cases. And in these cases, using a bidirectional scale for unidirectional data is bad, and vice versa.

bertil
0 replies
3h55m

He is explaining the situation using the length of bars. He doesn't recommend using both at the same time.

SiempreViernes
0 replies
4h54m

I think the "examples" are better understood as displays of the properties of the colour scales, not examples of how they are used in the wild.

That is to say they aren't examples of when you would use a colour scale at all.

giraffe_lady
3 replies
15h15m

These are still all bad to my eye. Way too much "chartjunk", too many colors for most of them. You simply don't need all those lines. They are easier to read if you put less on them, but carefully. Any edward tufte book is about this and a handful of basic techniques go a long way.

minimaxir
2 replies
14h41m

These charts are made with ggplot2, which is based on the grammar of graphics.

EDIT: I thought Tufte did it, I was wrong. https://www.amazon.com/Grammar-Graphics-Statistics-Computing...

giraffe_lady
1 replies
14h33m

Grammar of graphics is a different guy I think but it doesn't matter what it's based on because they still aren't good!

Tufte's whole like, thing is essentially that charts need to be intentionally designed to convey a specific concept to a specific audience via careful choice about what information to present. A library can help you with the shapes but can't automate the decisions.

minimaxir
0 replies
14h29m

You're correct, I edited my comment.

tedunangst
2 replies
15h37m

Ironically, I was about ten examples in before I even noticed the tiny good/bad labels.

splonk
0 replies
15h30m

I also appreciate how some of the examples randomly put the bad examples at the end instead of the beginning.

I'm not going to say people with glass houses shouldn't throw stones, but maybe you should walk outside first.

fastasucan
0 replies
15h9m

Yeah, the figures are to cluttered and have too little space between them. In addition its impossible to scroll through that text and figure out the content by just glancing on the charts - as there are no clear marking of which is good and which is bad.

motohagiography
2 replies
15h46m

I wish friends would call them charts and not graphs.

jader201
0 replies
14h2m

It seems like the distinction is subjective and/or ambiguous at worst, and confusing at best [1]. Some people use them interchangeably (not saying this is correct, but that some people think they mean the same thing).

[1] https://english.stackexchange.com/a/43029

dcl
0 replies
14h0m

I use the word 'plot'. Graph is overloaded in the maths/stats/comp sci/ML/etc space.

tomgp
1 replies
6h2m

This is a great overview of common mistakes in data viz I will be sharing it with my colleagues. As a good supplement I highly recommend Kennedy Eliot's "39 studies about human perception in 30 mins" https://medium.com/@kennelliott/39-studies-about-human-perce...

... a whistle stop tour around the research basis for a lot of claims around data viz best practice (esp interesting re the dogma around not using pie charts which seems to be a consistent bugbear of designers going back to the 1930's but around which the research is inconclusive at best)

[edit: fixed a typo]

uolmir
0 replies
1h57m

I hadn't seen that, but I was going to recommend this recent paper: https://journals.sagepub.com/doi/full/10.1177/15291006211051...

At a glance of the medium article, they seem to have the same perspective on these issues, so the one should complement the other.

paradox460
1 replies
14h9m

I still generally attempt to follow the rules established by Edward Tufte in The Visual Display of Quantitative Information.

Basically, Tufte used the idea of "ink", classified into two groups, data ink and useless ink. The goal is to have a graph with as little useless ink as possible; where every bit of information visible in a graph (or table) is relevant to the end output. To this extent, he recommended dumping axis lines where unneeded, labels, keys, gridlines, and many more things.

LaTeX tables, by default, tend to look like what Tufte proposed, which is probably why LaTeX tables look so damn good compared to the HTML defaults

TehShrike
0 replies
13h47m

While reading chapter 6 of that book, I was inspired to make this digital scatter graph trying to implement his "data-ink maximization" principle: https://tehshrike.github.io/classy-graph/

It was a good exercise.

I really like that book.

baazaa
1 replies
12h28m

Never understood the hate for pie charts when stacked bar charts have a similar problem (hard to compare groups when they're in different positions). If a pie chart isn't good enough, probably switch to a paired bar chart or some other pair-wise comparison.

kergonath
0 replies
4h7m

Stacked bar charts are terrible as well, there is no inconsistency :)

tunesmith
0 replies
11h21m

I remember reading a guide a while back, just a web page... from someone who just seemed like they were a master at picking the right graph/chart style for different types of data. He approached it in an exhaustive way by breaking down data into categories and explaining, for each data style, exactly why each chart style and not another one should be picked. I really wish I had bookmarked that guide because it just made it seem so straightforward.

stared
0 replies
7h58m

Quite a lot of these lessons are not new, see Willard C. Brinton "Graphic presentation" - a book (1939), freely accessible https://archive.org/details/graphicpresentat00brinrich/mode/....

rnburn
0 replies
14h39m

One other rule I've found helpful is banking to 45 degrees -- it's an easy way to rmake elationships easier to perceive: http://vis.stanford.edu/files/2006-Banking-InfoVis.pdf

power_fart
0 replies
7h21m

Writing an "opinionated essay" in a github readme is the most 2023 dev thing I've seen

gullywhumper
0 replies
4h26m

Related - Kaiser Fung has a long-running blog on bad graphs:

https://junkcharts.typepad.com/

firecraker
0 replies
9h41m

Nice.

But I'd add getting rid of heatmaps on large datasets. They are information dense and pretty, but I can't see how anyone interprets them. Better to do clustering and plot the data for each relevant cluster in a more meaningful way.

Just a thought

bjoli
0 replies
5h8m

I have never liked violin plots,but I am very much not in the field of data visualisation. Just a week ago I stumbled on a video titled "violin plots should not exist" which made things fall into place: https://youtu.be/_0QMKFzW9fw?feature=shared

beloch
0 replies
13h24m

A more basic page might serve as a valuable introduction for some.

e.g. How many graphs have you seen on evening news programs that don't even have axes labels?