thank you for this. i was watching it to procrastinate on my materials final, but now i realize i misread the log graph and consequentially found incorrect strain rates. maybe procrastination is good!
I'm not active on youtube, but several friends sent this to me so I decided to watch it. This was a lovely video to watch. It is especially great to see so many papers from my research community (data visualization). Hilariously, my oldest talks (from back when I was at Northwestern) are about how astrophysicists are terrible at dataviz and should get better (and fun fact: one of my visualizations is featured in Barry Barish's 2017 Nobel lecture on physics, uncited of course). A few notes: 1. Tufte's "data-to-ink ratio" and "chart junk" are relatively controversial concepts in the research community because they aren't really observable and super hard to measure. At what point is too much ink actually observed? What is considered ornamentation? Etc. Empirical research hasn't concluded that these are actually observable principles in practice. They're not bad as handwavy concepts and loose principles (especially for beginners), but the real things that matter are capturing people's attention+pre-attention, keeping that attention, and then making people remember what they've learned. And we've observed that some ornamentation can be great for memory! Also, some visualizations are really slow and actually still effective. So speed (especially focusing on pre-attentive stuff) isn't always the correct ideal for design. Some ornamentation (ink and junk) is actually good. Parsons and my good friend Akbaba both have a few pieces (research, position papers, etc) on these things worth checking out. 2. Pie charts are probably fine. They really aren't that bad most of the time. I used to be really against them, like feral almost. But then I realized that research just isn't even remotely conclusive on how bad they actually are. Saying that a pie chart is "worse" than another choice sounds dramatic, but the measured accuracy of people using pie charts hardly drops off compared to other visualization types. Kosara and others have explored this, if you care to dig in. Kosara has a youtube channel (eagereyes) and a great blog too. Anyway, my conjecture is that people dislike pies largely because of Tufte and Few (the latter is an actual villain in our community). It's a bit over-dramatized. 3. Amazing to see you mention accessibility! That's actually my whole research area: accessible data interaction. Made me so happy to hear someone with a big following mention it. This is probably why my friends told me I should watch this. Thanks a ton for that. Things have shifted a lot in recent years on that topic, which has been awesome to see. Folks + books to check out (for anyone reading my comments): - (book) Alberto Cairo's "How Charts Lie" (good friend but also considered one of the real shepherds of our field of practice today, a wonderful person to learn from) - (theory paper) Akbaba's "Entanglements" - (research summary) Franconeri's "The science of visual data communication: what works" (Franconeri is just outstanding and this is one of the best scientifically focused papers in our field) - (guidelines) And for anyone looking to get into making more accessible data visualizations, check out my guidelines workbook and research paper: "Chartability"
@@ummon YES! They are indeed bad. :) Angela Collier (@acollierastro) has a nice video on that topic, should you desire additional validation on that point. :) And Frank: Yeah, thanks for a super interesting comment!
@@kaiserruhsam Angela Collier & Dr. Fatima have similar vibes and should do a podcast together? I have opinions on both of these assertions, but please, go on.
@@kaiserruhsam I don't think they have similar vibes at all. Maybe they're both women, and that's it. Still, I would totally watch a collab video by them
As a STEM person, I suspect that STEM people are bad at grasping the intrinsic assumptions and biases that come into play when visualizing data, and they think of graphs not as communication devices to convince readers in an exact idea, but rather as a way to demonstrate all the data so that the viewers can come to their own conclusions. And since many of us don't realize that there is never one true fact one can deduce from observations, we expect the readers' reached conclusion to coincide with the author's.
33:32 "red green" is kind of a misnomer. It affects the cells that distinguish between red and green *spectral color* but the actual practical confusions are far more complex. Many red-green pairings are far easier to see than green-orange, green-yellow, or blue-purple. Use as few colors as possible, use redundant systems like shapes, and ask someone to check your work. Most of us will do it for free.
One interesting thing to note is that the miscommunication of the Florida gun deaths graph was an accident. The author was mimicking the famous "Iraq's Bloody Toll" graph. The creator agrees that its very misleading though
I feel like I hear people talk about how important data visualization is a lot more than this kind of explanation of what to be mindful of. So thank you for finally explaining these basic ideas
One of my favorite unwieldy visualizations is a venn diagram. They can be useful in teaching certain aspects of mathematics where any overlap at all can imply infinitely many things within that overlapping region, so the size of the overlap doesn't matter. But if they're representing things from the real, finite, world, they get so unmanageable so fast. People are bad at handling area as a form of proportionality, especially if the areas are different shapes. And so most venn diagrams used for real things just ignore how big the categories they're representing actually are. Which changes how people perceive how much overlap there actually is. And if you're representing 3 categories, it often becomes a mess of hues, lines, and labels. And 4 categories requires creative shapes like ovals, or cutting out certain overlaps, but most people wouldn't notice whether those were intentionally or accidentally left out. 5 or more is often outright impossible without completely unintuitive shapes. Just really good, frequently less-than-useful visualizations.
As an economist I'm usually just an ignorant spectator enjoying learning some new and cool things. Turns out it's also very fun to watch a Dr. Fatima video covering something I have to engage with almost daily (albeit on very different subject matter). A little surprised to see someone saying you should avoid grid lines. On bar graphs they certainly add little and shouldn't be dense nor boldly colored, but on line graphs and scatter plots humans are way better at estimating data point values with grid lines.
That eta on the x axis story, sounds like the type of guy that sees himself so above others... he thinks everyone else has read all his previous research
From Betty Edwards’ Drawing on the Artist Within, Perceive The Edges; Perceive The Negative Spaces; Perceive The Relationships and Proportions; Perceive the Chiaroscuro; and Perceive the Gestalt. This along with the exercises and design points Andrew Loomis makes throughout Creative Illustration strike me as the way forward.
@6:56 up to here this is sufficient data visualization. What Morton Thiokol engineers ignored was not data viz but politics. They should have just written a letter to NASA saying they'd sue them for murder and "here is our case." No need for pretty graphs. "This analysis is ready to go to a journo at all the major newspapers when you launch, successful flight or not." (Might even be that the more obscure the data presentation probably the better that strategy works?) At least that's what I'd have done. But then I'm off the charts autistic and have no filter when it comes to moral outrage and injustice. Why would normie engineers not do this? I think your answer there is something like "capitalism": "It'll cost NASA some "tax payer dollars" if they don't stick to schedule, blah, blah." Well boo-hoo. The tax payer is never funding a currency issuing government (it's the other way around - the tax return is a redemption operation, not a funding operation), so that'd have been wrong. But really, it's just boring banal greed, fear, doubt, uncertainty: the engineers don't want to risk cutting off the hand that feeds them.
I think that people generally care less about outcomes when they see themselves as not the ultimate point of responsibility as well. Possibly, the engineers earnestly felt that they had done all they could, should, and were expected to do by presenting this information to the ultimate decision-makers. Unfortunately, disasters rarely come down to a single point of failure. Everyone could have done more, exercised more caution, communicated with greater clarity. In extreme environments, errors snowball. Also, hindsight is 20-20, and the engineers didn't *know* it was going to fail, they just had some data that suggested that *if* a failure was going to happen, it was more likely at lower temps. Again, this dilutes responsibility and allows room for other factors to take precedent (e.g., budget, time constraints, odds of success v. failure). Again, responsibility gets diffused when you can't say with absolute certainty that an outcome is likely, only that an outcome is possible.
Altogether excellent! Love how you brought together the importance of data visualization with a historical example. A great way to promote critical discussion and scientific literacy. For all the intellect on clear display my favorite moment in your presentation is undoubtedly @33:47
This reminded me of an experience I had early in my BA when one of my professors was speaking on his own research that "70% of participants experienced the effect, which is an amazing number. It doesn't even matter what the sample was when you have a number like 70%" and in that instant I decided he was a fool of the highest order and deserved nothing but contempt. He also turned out to be a s*x predator, and while I'm not saying those things are correlated, I am saying that sometimes you just *know* about people.
28:35 ironically, those graphs are also a bit misleading because they aren't showing equivalent information. In the control there's a difference of what looks like about 10 but in the deceptive graph it's clearly a difference of around 25-30. The omacement of the graphs side by side implies they're the same information just formatted differently, but they're actually quite dramatically different in a way that implies the perception issue is bigger than it is.
I utterly LOVED this video. I took a Modeling Ecology class in college, which I might have actually enjoyed if it was in excel and not R. I would have LOVED the class, really. Thank you for this! So cool so useful
My sympathies - all my friends who have dealt with R have hated it. I'm fortunate to not have had to learn R (biochemistry tends to use a lot of python, which seems easier). The class sounds interesting though. I feel like finding the right models and visualisations in ecology is especially important and also tricky. I'm reminded of a paper I read last year that gave many different research groups (246 of them) one of two data sets, and a corresponding question: “To what extent is the growth of nestling blue tits (Cyanistes caeruleus) influenced by competition with siblings?” or “How does grass cover influence Eucalyptus spp. seedling recruitment?”. I think it was drawing inspiration from similar meta-style studies in psychology and other social sciences. Overall, I think they found that the different research groups generally came to the same conclusions, but that there was significant disagreement on the magnitude of the effects, and the statistical significance. The paper got a lot of news coverage at the time because it makes interesting points about what "reproducibility" means when the same data can produce such different results, which leads to bigger questions about objectivity/subjectivity in science. The paper was titled "Same data, different analysts: variation in effect sizes due to analytical decisions in ecology and evolutionary biology". Heads up that it doesn't seem to have cleared peer review yet (which does seem odd to me, but that could be because of the huge number of contributing researchers), but the preprint is readily available online if that's something you'd enjoy. (Tangent point: press coverage of preprint papers is really weird and Science/Journalism/Society needs to figure out how to handle this, because it feels like we all concluded that science by press conference was bad (See: Andrew Wakefield's vaccine nonsense, or the Cold Fusion debacle (which Bobby Broccoli has recently made excellent videos of on TH-cam). More recently than that, I remember Rosie Redfield documenting and debunking arsenic based life stuff in the early 2010s.). In this case, I am not suspicious of the paper, because I have read it in full and it seems pretty good, but the rise of preprint papers seems to have led to an increasing amount of journalistic coverage for non-reviewed papers (for good or for ill).
I teach several college courses that heavily employ R, and this is an experience I often hear from my students who've had previous experiences with R programming in class. The thing is, R skills are a huge asset to give your students, but an instructive has to be ready to give students massive support so that they can actually learn how to do it. This is where many instructors fail. They do not help students through the learning curve enough to get them to where they are capable with it. Also, I think many instructors fail to actually demonstrate _why_ R programming is valuable and eventually saves you tons of time and effort. Sorry you had a (typically) bad experience.
The bit about recognizing the importance of not treating humans as if we are faulty computers is really funny to me. Of course, computers are very technical and precise, and understanding how to use them well at a deep level is unintuitive for most people, so we think that those who do are therefore uniquely valuable. On the other hand, computers are also necessarily simplified compared to humans, both because the standardization constraints of industrial manufacturing, and the conceptual constraints of mathematics as a system underpinning computing. Therefore, while computers are a lot to understand, they are still/still use a simplified model of cognition and perception compared to humans, so thinking that humans being more varied than computers is a fault of humans rather than a limitation of computers, and by extension our general inability to cognitively handle complexity, is an astounding bit of projection.
Data Viz and Sci Com sitting in a tree.. 😂 always love your passion and skill on both aspects ! Also justice for pie charts, arguably the tastiest charts. 🥧
Great stuff! I had a professor (circa 1987) who said, visual communication should be instant and correct. I think he may have been quoting someone, but I don't know who. Thanks for the video.
I love this discussion, miscommunication via misuse of data is one of the most common issues I have to tackle in my day job. That said, bad data visualization wasn't the problem here. Good project management processes and empowering the right experts stops failures like this from happening, they don't on rely data visualization...good or bad. The company and experts who built the systems you're relying on call you the night before your launch to tell you they've found a problem. Before you even look at the graphs they brought with them you should be leaning towards calling the launch off. When they show you graphs that don't seem to support their claims you don't say, "Whew! The launch is back on!" instead you say, "Hey, the seriousness of your words isn't backed up by this graph you're showing us, can you help us understand why that's the case?" If a graph was actually a significant decision point in this process then there were some systemic process failures already happening.
I'm in my bachelors, studying Earth & Environmental science. My maths anxiety has morphed into statistics anxiety. I really don't like messing with stats. Good thing my digital illustration interests help keep my graphs pretty & grades afloat! Ha! Ha ha ha... ha... ha.......ha... Numbers are scary, I just wanna be a professional dirt know-it-all that can help people...
Hoping my friend Daniel who I sent the url of your channel is also watching. He's hoping to get work with his environmental science (which? IDK) so I'd think this one would be good for him. I hope he gets a job, but I'd miss him as a bartender. Love your story telling. Thanks!
To give an example of a good use case for log scales: they are useful in applied mathematics when showing convergence rates for algorithms or numerical methods. For example, y = x^2 looks like a line with slope 2 on a log-log plot (because log y = 2 log x). The difference between a line of slope 2 and a line of slope 3 is much easier to spot than the difference between a quadratic and a cubic.
33:40 Fun fact, you can tell a data viz person was in the room when California came up with their color coding system for COVID risk, because they used colors from the plasma palette. I was so proud of my state for doing that accessible ass shit.
The first two weeks (one fifth of the class) of Statistical Analysis (not Methods), we talked about data visualization and how not to make figures. It was extremely useful in a different class 3 years later reviewing published scientific papers, we found more than one (of the six we studied) that had really poor data visualization and/or bad statistics, ie "5 plus or minus 13" when any number less than 0 is not possible in real world
5±13 isn't necessarily an error, a long-tailed distribution can have a mean of 5 and a standard deviation of 13 even when it can only give non-negative numbers (for example, the distribution of x^4 when x follows a normal distribution with mean 0.830587 and standard distribution 0.847734).
The information towards the end of the video about how presenting objective evidence doesn't automatically mean people will understand and take it into account made something click in my brain. Thank you 💕
29:42 Yeah the caveat here is important. In ecology for instance, we very often deal with incredibly skewed distributions where you literally can't make any pattern out without a log transformation (or other transformation).
@21:28 Checks out, the numbers are not even there, so they can't be lying. I'm glad I stuck through to learn about the most offended. Greek letter eta? Damn near killed her!
was time actually on the vertical over eta? I feel that just tenfolds the need to define it - gotta have a very good reason to not put time on the horizontal
16:48 is sadly so true even on a more mundane & smaller scale I work in fast food & I’m sure others have observed the same, When customers or delivery drivers come in to look for their orders they 70% of the time miss it & come directly to us for it Even though the customer name is right there on the bag! they see the server name first & stop there 😅
Papa's got a brand new bag, arabic version... Yes, yes, data visualization and all... but... papa's got a brand new bag... in arabic... Let's goooooooo!
The version I have seen of this "bad graph caused the disaster" story was the one with the scatterplot of only the o-rings failures, and doing a linear regression the slope was not significant so "no significant relationship between temperature and failures" was supposedly the excuse to launch. I have since christened the specific selection bias of omitting zeroes from a dataset as the "Challenger disaster bias", and it does happen regularly when people don't pay attention to selection bias in general.
I was a Ronald E McNair scholar during my undergrad years ago, and without that experience I wouldn't have gone on to do my PhD. And I often think about how if the Challenger disaster never happened, that scholarship program wouldn't have been created/funded in quite the same way, and I probably would be doing something very different today.
... tbf, if Dr. Fatima told me "we shouldn't launch because to cold" i would not even look at te Graph... I swear, she has the *exact* same tone my mom had always when she was just "disappointed" by me. PTSD triggered, lifes saved 😅😉
On not using log scales if the graph isn't intended exclusively for a technical audience, I remember there was a lot of Discourse about this during COVID. I'm super biased, but it felt like the two sides were Scientists(TM) who were pro- log scale for exponentially increasing data, and Science Communicators (many of whom were also scientists) arguing against log scales (making largely the same points as in this video). I'll see if I can find who it was, but I remember someone arguing against log scales captured the essence of the debate beautifully: a log scale would be a better way of communicating the *data*, but the thing these graphs were trying to communicate was *information* of exponential increase, which most people can understand intuitively to some level. The piece that framed it in this way was compelling because it aimed to explore why there was this big divide in the Discourse, and why Scientists(TM) were getting so angry at curvy graphs. (terrible memey joke: "real graphs have curves")
My favourite tip for analysing graphs is to let it take some time! You have to make the time to read the axes and look at the details to have any chance of spotting misleading data visualisation. At this point I simply refuse to acknowledge graphs that people show me quickly to try to convince me of anything.
From what Ive read there was pressure to launch from the Whitehouse. The mission was supposed to be a crowning achievement for Reagan and the US, to put a teacher into space! If I remember correctly the day they decided to launch was the second time they had fueled the rocket for launch and cost of fueling may have also played a factor. I met a guy who was a Budget Analyst on the SRB, who provided some documents to the investigation. He wrote a book about it, "Challenger Revealed" Its because of him we know about the O-ring failure and that NASA and Reagan admin tried to whitewash it as an accident.
Ok, so, I've saved every single song you put under your videos in my favourites playlist by now, they are fakking amazing. Do you maybe have a playlist for me with a brunch more from your amazing taste in music?
11:03 Graphs were made popular in a book written by William Playfair... "Will I Play Fair?" Given how easily graphs can say whatever we want, it all seems like some kind of cosmic joke.😮🌠😅
ISTR that Tufte has also made the argument that bad powerpoint design contributed to the decision to have Columbia re-enter after the foam strike, with the most critical information listed in the smallest bullet points. It's almost like NASA needs someone to take the findings from the engineers and present them to the managers so the engineers don't have to. Someone with people skills. A bit like the guy from Office Space.
The visual at 6:50 could at least have been sorted by ambient temperature. Then, the booster figures would get junkier as one would go from one side of the graph to the other.
5:29 Tufte has some very good ideas, but for me his ideas sorta go into "Good flag bad flag" type thinking at times, and aren't always productive. I think his stuff, plus the work of William S Cleveland make for a good basis to set a person up for success with good data viz, and also trusting your gut enough to ignore their guidance when it seems merited to do so.
Log scales are ok when graphing the frequency content of a sound signal I think. Humans perceive pitch more or less logarithmically so a log scale really helps with making the visualization match with what the signal actually sounds like. Same kinda goes for loudness really.
thank you for this. i was watching it to procrastinate on my materials final, but now i realize i misread the log graph and consequentially found incorrect strain rates. maybe procrastination is good!
You could turn these anecdotes into data and then visualize it! That would probably help you with that final
Maybe procrastination is the friends we made along the way
Is your profile picture american
I'm not active on youtube, but several friends sent this to me so I decided to watch it. This was a lovely video to watch. It is especially great to see so many papers from my research community (data visualization). Hilariously, my oldest talks (from back when I was at Northwestern) are about how astrophysicists are terrible at dataviz and should get better (and fun fact: one of my visualizations is featured in Barry Barish's 2017 Nobel lecture on physics, uncited of course). A few notes:
1. Tufte's "data-to-ink ratio" and "chart junk" are relatively controversial concepts in the research community because they aren't really observable and super hard to measure. At what point is too much ink actually observed? What is considered ornamentation? Etc. Empirical research hasn't concluded that these are actually observable principles in practice. They're not bad as handwavy concepts and loose principles (especially for beginners), but the real things that matter are capturing people's attention+pre-attention, keeping that attention, and then making people remember what they've learned. And we've observed that some ornamentation can be great for memory! Also, some visualizations are really slow and actually still effective. So speed (especially focusing on pre-attentive stuff) isn't always the correct ideal for design. Some ornamentation (ink and junk) is actually good. Parsons and my good friend Akbaba both have a few pieces (research, position papers, etc) on these things worth checking out.
2. Pie charts are probably fine. They really aren't that bad most of the time. I used to be really against them, like feral almost. But then I realized that research just isn't even remotely conclusive on how bad they actually are. Saying that a pie chart is "worse" than another choice sounds dramatic, but the measured accuracy of people using pie charts hardly drops off compared to other visualization types. Kosara and others have explored this, if you care to dig in. Kosara has a youtube channel (eagereyes) and a great blog too. Anyway, my conjecture is that people dislike pies largely because of Tufte and Few (the latter is an actual villain in our community). It's a bit over-dramatized.
3. Amazing to see you mention accessibility! That's actually my whole research area: accessible data interaction. Made me so happy to hear someone with a big following mention it. This is probably why my friends told me I should watch this. Thanks a ton for that. Things have shifted a lot in recent years on that topic, which has been awesome to see.
Folks + books to check out (for anyone reading my comments):
- (book) Alberto Cairo's "How Charts Lie" (good friend but also considered one of the real shepherds of our field of practice today, a wonderful person to learn from)
- (theory paper) Akbaba's "Entanglements"
- (research summary) Franconeri's "The science of visual data communication: what works" (Franconeri is just outstanding and this is one of the best scientifically focused papers in our field)
- (guidelines) And for anyone looking to get into making more accessible data visualizations, check out my guidelines workbook and research paper: "Chartability"
What a comment! It should be pinned. Thanks for writing this out!
But violin plots are actually bad right? Right?!?
@@ummon YES! They are indeed bad. :) Angela Collier (@acollierastro) has a nice video on that topic, should you desire additional validation on that point. :)
And Frank: Yeah, thanks for a super interesting comment!
Fatima: Human vision is fallible.
Me: I can't see why?
BahahaHA love this, thanks for the chuckle
im annoyed at how funny this is, great work
lol, nice. :)
I am living for that Angela Collier crossover (she mentioned the O-rings in her Feynman video)
That would be a podcast duo to watch !
@@Vinylectric why?
@@comicbrandon do you watch their videos? they're both science talkers and have broadly similar vibes
@@kaiserruhsam Angela Collier & Dr. Fatima have similar vibes and should do a podcast together? I have opinions on both of these assertions, but please, go on.
@@kaiserruhsam I don't think they have similar vibes at all. Maybe they're both women, and that's it. Still, I would totally watch a collab video by them
As a STEM person, I suspect that STEM people are bad at grasping the intrinsic assumptions and biases that come into play when visualizing data, and they think of graphs not as communication devices to convince readers in an exact idea, but rather as a way to demonstrate all the data so that the viewers can come to their own conclusions. And since many of us don't realize that there is never one true fact one can deduce from observations, we expect the readers' reached conclusion to coincide with the author's.
Not first, but spiritually ascending
33:32 "red green" is kind of a misnomer. It affects the cells that distinguish between red and green *spectral color* but the actual practical confusions are far more complex. Many red-green pairings are far easier to see than green-orange, green-yellow, or blue-purple.
Use as few colors as possible, use redundant systems like shapes, and ask someone to check your work. Most of us will do it for free.
One interesting thing to note is that the miscommunication of the Florida gun deaths graph was an accident. The author was mimicking the famous "Iraq's Bloody Toll" graph. The creator agrees that its very misleading though
Starting the chat for the algorithm.
We've been waiting and welcome back Doctor!
I feel like I hear people talk about how important data visualization is a lot more than this kind of explanation of what to be mindful of. So thank you for finally explaining these basic ideas
One of my favorite unwieldy visualizations is a venn diagram.
They can be useful in teaching certain aspects of mathematics where any overlap at all can imply infinitely many things within that overlapping region, so the size of the overlap doesn't matter.
But if they're representing things from the real, finite, world, they get so unmanageable so fast. People are bad at handling area as a form of proportionality, especially if the areas are different shapes. And so most venn diagrams used for real things just ignore how big the categories they're representing actually are. Which changes how people perceive how much overlap there actually is.
And if you're representing 3 categories, it often becomes a mess of hues, lines, and labels. And 4 categories requires creative shapes like ovals, or cutting out certain overlaps, but most people wouldn't notice whether those were intentionally or accidentally left out. 5 or more is often outright impossible without completely unintuitive shapes.
Just really good, frequently less-than-useful visualizations.
I love the meme Venn diagrams that "explain" the commonalities and differences between things like bank robbers, DJs and preachers.
As an economist I'm usually just an ignorant spectator enjoying learning some new and cool things. Turns out it's also very fun to watch a Dr. Fatima video covering something I have to engage with almost daily (albeit on very different subject matter). A little surprised to see someone saying you should avoid grid lines. On bar graphs they certainly add little and shouldn't be dense nor boldly colored, but on line graphs and scatter plots humans are way better at estimating data point values with grid lines.
The graphs about the benefits of supporting your Patreon were quite compelling and convinced me to join.
My favorite science TH-camr!
That eta on the x axis story, sounds like the type of guy that sees himself so above others... he thinks everyone else has read all his previous research
From Betty Edwards’ Drawing on the Artist Within,
Perceive The Edges;
Perceive The Negative Spaces;
Perceive The Relationships and Proportions;
Perceive the Chiaroscuro; and
Perceive the Gestalt.
This along with the exercises and design points Andrew Loomis makes throughout Creative Illustration strike me as the way forward.
This is why I love your channel. Who knew that a 43:35 minute video on graphs would be so informative and hilarious at the same time. Thank you.
@6:56 up to here this is sufficient data visualization. What Morton Thiokol engineers ignored was not data viz but politics. They should have just written a letter to NASA saying they'd sue them for murder and "here is our case." No need for pretty graphs. "This analysis is ready to go to a journo at all the major newspapers when you launch, successful flight or not." (Might even be that the more obscure the data presentation probably the better that strategy works?) At least that's what I'd have done. But then I'm off the charts autistic and have no filter when it comes to moral outrage and injustice.
Why would normie engineers not do this? I think your answer there is something like "capitalism": "It'll cost NASA some "tax payer dollars" if they don't stick to schedule, blah, blah." Well boo-hoo. The tax payer is never funding a currency issuing government (it's the other way around - the tax return is a redemption operation, not a funding operation), so that'd have been wrong. But really, it's just boring banal greed, fear, doubt, uncertainty: the engineers don't want to risk cutting off the hand that feeds them.
I think that people generally care less about outcomes when they see themselves as not the ultimate point of responsibility as well. Possibly, the engineers earnestly felt that they had done all they could, should, and were expected to do by presenting this information to the ultimate decision-makers. Unfortunately, disasters rarely come down to a single point of failure. Everyone could have done more, exercised more caution, communicated with greater clarity. In extreme environments, errors snowball. Also, hindsight is 20-20, and the engineers didn't *know* it was going to fail, they just had some data that suggested that *if* a failure was going to happen, it was more likely at lower temps. Again, this dilutes responsibility and allows room for other factors to take precedent (e.g., budget, time constraints, odds of success v. failure). Again, responsibility gets diffused when you can't say with absolute certainty that an outcome is likely, only that an outcome is possible.
As someone who has dyscalculia, I struggle with reading graphs. Knowing what to look for really helps
Altogether excellent! Love how you brought together the importance of data visualization with a historical example. A great way to promote critical discussion and scientific literacy. For all the intellect on clear display my favorite moment in your presentation is undoubtedly @33:47
“You don’t need to worry about η”
This reminded me of an experience I had early in my BA when one of my professors was speaking on his own research that "70% of participants experienced the effect, which is an amazing number. It doesn't even matter what the sample was when you have a number like 70%" and in that instant I decided he was a fool of the highest order and deserved nothing but contempt.
He also turned out to be a s*x predator, and while I'm not saying those things are correlated, I am saying that sometimes you just *know* about people.
28:35 ironically, those graphs are also a bit misleading because they aren't showing equivalent information. In the control there's a difference of what looks like about 10 but in the deceptive graph it's clearly a difference of around 25-30.
The omacement of the graphs side by side implies they're the same information just formatted differently, but they're actually quite dramatically different in a way that implies the perception issue is bigger than it is.
Fair enough. I hear 'dada' everytime someone wants to say 'data' anyway, which might come handy in your field as well. Mush love!
Bless Michael Smith (last words: uh oh) for continuing to pilot that thing to minimize damage to innocents on crash even knowing he was going to die.
I utterly LOVED this video. I took a Modeling Ecology class in college, which I might have actually enjoyed if it was in excel and not R. I would have LOVED the class, really. Thank you for this! So cool so useful
My sympathies - all my friends who have dealt with R have hated it. I'm fortunate to not have had to learn R (biochemistry tends to use a lot of python, which seems easier).
The class sounds interesting though. I feel like finding the right models and visualisations in ecology is especially important and also tricky. I'm reminded of a paper I read last year that gave many different research groups (246 of them) one of two data sets, and a corresponding question: “To what extent is the growth of nestling blue tits (Cyanistes caeruleus) influenced by competition with siblings?” or “How does grass cover influence Eucalyptus spp. seedling recruitment?”. I think it was drawing inspiration from similar meta-style studies in psychology and other social sciences. Overall, I think they found that the different research groups generally came to the same conclusions, but that there was significant disagreement on the magnitude of the effects, and the statistical significance.
The paper got a lot of news coverage at the time because it makes interesting points about what "reproducibility" means when the same data can produce such different results, which leads to bigger questions about objectivity/subjectivity in science. The paper was titled "Same data, different analysts: variation in effect sizes due to analytical decisions in ecology and evolutionary biology". Heads up that it doesn't seem to have cleared peer review yet (which does seem odd to me, but that could be because of the huge number of contributing researchers), but the preprint is readily available online if that's something you'd enjoy.
(Tangent point: press coverage of preprint papers is really weird and Science/Journalism/Society needs to figure out how to handle this, because it feels like we all concluded that science by press conference was bad (See: Andrew Wakefield's vaccine nonsense, or the Cold Fusion debacle (which Bobby Broccoli has recently made excellent videos of on TH-cam). More recently than that, I remember Rosie Redfield documenting and debunking arsenic based life stuff in the early 2010s.). In this case, I am not suspicious of the paper, because I have read it in full and it seems pretty good, but the rise of preprint papers seems to have led to an increasing amount of journalistic coverage for non-reviewed papers (for good or for ill).
I teach several college courses that heavily employ R, and this is an experience I often hear from my students who've had previous experiences with R programming in class. The thing is, R skills are a huge asset to give your students, but an instructive has to be ready to give students massive support so that they can actually learn how to do it. This is where many instructors fail. They do not help students through the learning curve enough to get them to where they are capable with it. Also, I think many instructors fail to actually demonstrate _why_ R programming is valuable and eventually saves you tons of time and effort. Sorry you had a (typically) bad experience.
I have a love-hate relationship with R.
The bit about recognizing the importance of not treating humans as if we are faulty computers is really funny to me. Of course, computers are very technical and precise, and understanding how to use them well at a deep level is unintuitive for most people, so we think that those who do are therefore uniquely valuable. On the other hand, computers are also necessarily simplified compared to humans, both because the standardization constraints of industrial manufacturing, and the conceptual constraints of mathematics as a system underpinning computing. Therefore, while computers are a lot to understand, they are still/still use a simplified model of cognition and perception compared to humans, so thinking that humans being more varied than computers is a fault of humans rather than a limitation of computers, and by extension our general inability to cognitively handle complexity, is an astounding bit of projection.
Data Viz and Sci Com sitting in a tree.. 😂 always love your passion and skill on both aspects ! Also justice for pie charts, arguably the tastiest charts. 🥧
Great stuff! I had a professor (circa 1987) who said, visual communication should be instant and correct. I think he may have been quoting someone, but I don't know who. Thanks for the video.
I love this discussion, miscommunication via misuse of data is one of the most common issues I have to tackle in my day job. That said, bad data visualization wasn't the problem here. Good project management processes and empowering the right experts stops failures like this from happening, they don't on rely data visualization...good or bad. The company and experts who built the systems you're relying on call you the night before your launch to tell you they've found a problem. Before you even look at the graphs they brought with them you should be leaning towards calling the launch off. When they show you graphs that don't seem to support their claims you don't say, "Whew! The launch is back on!" instead you say, "Hey, the seriousness of your words isn't backed up by this graph you're showing us, can you help us understand why that's the case?"
If a graph was actually a significant decision point in this process then there were some systemic process failures already happening.
I'm in my bachelors, studying Earth & Environmental science. My maths anxiety has morphed into statistics anxiety. I really don't like messing with stats. Good thing my digital illustration interests help keep my graphs pretty & grades afloat! Ha! Ha ha ha... ha... ha.......ha...
Numbers are scary, I just wanna be a professional dirt know-it-all that can help people...
this feels almost cathartic after watching the 'well there's your problem: military power points' episode
I strongly disagree : the data visualizations I made along the way were inside me all along.
(Also, sick tune, straight to my funk playlist! Thanks)
May the Algo and its angles bless this content.
Hoping my friend Daniel who I sent the url of your channel is also watching. He's hoping to get work with his environmental science (which? IDK) so I'd think this one would be good for him. I hope he gets a job, but I'd miss him as a bartender.
Love your story telling. Thanks!
I knew graphs could be misreading, but I hadn't seen how awful some of those were. Thank you!
To give an example of a good use case for log scales: they are useful in applied mathematics when showing convergence rates for algorithms or numerical methods. For example, y = x^2 looks like a line with slope 2 on a log-log plot (because log y = 2 log x). The difference between a line of slope 2 and a line of slope 3 is much easier to spot than the difference between a quadratic and a cubic.
Great video, the η story does a nice job tying data visualisation into broader science communication.
Oh mighty algorithm, I offer these keystrokes onto you, for this channel is good. Grant it your favor.
33:40 Fun fact, you can tell a data viz person was in the room when California came up with their color coding system for COVID risk, because they used colors from the plasma palette. I was so proud of my state for doing that accessible ass shit.
The first two weeks (one fifth of the class) of Statistical Analysis (not Methods), we talked about data visualization and how not to make figures. It was extremely useful in a different class 3 years later reviewing published scientific papers, we found more than one (of the six we studied) that had really poor data visualization and/or bad statistics, ie "5 plus or minus 13" when any number less than 0 is not possible in real world
5±13 isn't necessarily an error, a long-tailed distribution can have a mean of 5 and a standard deviation of 13 even when it can only give non-negative numbers (for example, the distribution of x^4 when x follows a normal distribution with mean 0.830587 and standard distribution 0.847734).
The information towards the end of the video about how presenting objective evidence doesn't automatically mean people will understand and take it into account made something click in my brain. Thank you 💕
Very informative, I enjoyed. I haven't had much cause to use graphs lately, but this is good to know.
29:42 Yeah the caveat here is important. In ecology for instance, we very often deal with incredibly skewed distributions where you literally can't make any pattern out without a log transformation (or other transformation).
It’s not the pre-attentive destination but the visualization journey… lol the friends we made along the way
as a chemical engineer i am painfully aware of the pain of bad graphs.
SUCH an important video, and amazing (as usual). Too, that graph legit made me subscribe to your Patreon 😂
now i'm wondering if the paper actually said what the x axis stood for
I was counting down the minutes until you brought up Tufte after I saw the title.
Happy to see you again
oh my gosh I am going to have such fun thinking later, thanks!
this video triggered my ptsd of scientific method classes.
good job!
@21:28 Checks out, the numbers are not even there, so they can't be lying.
I'm glad I stuck through to learn about the most offended. Greek letter eta? Damn near killed her!
Yay! Thanks a ton!! I am in the middle of editing a poster for finals, bye bye pie graph!
May the algorithm bless this content
You rule Dr. Fatima!
Awesome music, as always!
This is how I get my intellectual workout post college.
Why did I think you were from Canada this whole time lmao. Awesome video like all the others thank you 🙏🏽
I wish I could hire the guy who made the charts and graphs for Enron. I guarantee you he could find a way to flip anything.
was time actually on the vertical over eta? I feel that just tenfolds the need to define it - gotta have a very good reason to not put time on the horizontal
I think I just unlocked a new special interest.🤓
A data analyst loving the vid!
excellent as always thank you
16:48 is sadly so true even on a more mundane & smaller scale
I work in fast food & I’m sure others have observed the same,
When customers or delivery drivers come in to look for their orders they 70% of the time miss it & come directly to us for it
Even though the customer name is right there on the bag! they see the server name first & stop there 😅
I will have to share this with my class.
What is η (eta)?
The world.... may never know.
Engagement for the engagement god!
Logarithmic scales are very popular with people trying to misrepresent data. Crypto-bros love them.
A child's drawing of an exploding rocket would have been more effective
as a graphic designer I loooooooved this video! let's be friends scientists and information designers please
Papa's got a brand new bag, arabic version... Yes, yes, data visualization and all... but... papa's got a brand new bag... in arabic... Let's goooooooo!
Oh boy, new video
The version I have seen of this "bad graph caused the disaster" story was the one with the scatterplot of only the o-rings failures, and doing a linear regression the slope was not significant so "no significant relationship between temperature and failures" was supposedly the excuse to launch. I have since christened the specific selection bias of omitting zeroes from a dataset as the "Challenger disaster bias", and it does happen regularly when people don't pay attention to selection bias in general.
I was a Ronald E McNair scholar during my undergrad years ago, and without that experience I wouldn't have gone on to do my PhD. And I often think about how if the Challenger disaster never happened, that scholarship program wouldn't have been created/funded in quite the same way, and I probably would be doing something very different today.
This is so intersting and enriching, good job
it is so interesting but my head is busy with luigi mangione so i will rewatch it later
... tbf, if Dr. Fatima told me "we shouldn't launch because to cold" i would not even look at te Graph... I swear, she has the *exact* same tone my mom had always when she was just "disappointed" by me. PTSD triggered, lifes saved 😅😉
Super interesting, thank you!
On not using log scales if the graph isn't intended exclusively for a technical audience, I remember there was a lot of Discourse about this during COVID. I'm super biased, but it felt like the two sides were Scientists(TM) who were pro- log scale for exponentially increasing data, and Science Communicators (many of whom were also scientists) arguing against log scales (making largely the same points as in this video).
I'll see if I can find who it was, but I remember someone arguing against log scales captured the essence of the debate beautifully: a log scale would be a better way of communicating the *data*, but the thing these graphs were trying to communicate was *information* of exponential increase, which most people can understand intuitively to some level. The piece that framed it in this way was compelling because it aimed to explore why there was this big divide in the Discourse, and why Scientists(TM) were getting so angry at curvy graphs.
(terrible memey joke: "real graphs have curves")
My favourite tip for analysing graphs is to let it take some time! You have to make the time to read the axes and look at the details to have any chance of spotting misleading data visualisation. At this point I simply refuse to acknowledge graphs that people show me quickly to try to convince me of anything.
From what Ive read there was pressure to launch from the Whitehouse. The mission was supposed to be a crowning achievement for Reagan and the US, to put a teacher into space! If I remember correctly the day they decided to launch was the second time they had fueled the rocket for launch and cost of fueling may have also played a factor. I met a guy who was a Budget Analyst on the SRB, who provided some documents to the investigation. He wrote a book about it, "Challenger Revealed" Its because of him we know about the O-ring failure and that NASA and Reagan admin tried to whitewash it as an accident.
Really interesting. Good video!
One of the algorithmic things is being done now.
Join a tenants' union, yes, absolutely, I am immediately angry on your behalf.
i'll be honest if you're honest, i learned the phrase 'aesthetic terrorism' from a cj the x video like last month.
Hold on. I have to re-watch this. I think I learned something, but I didn't take notes. Quality of this video was stellar.
Ok, so, I've saved every single song you put under your videos in my favourites playlist by now, they are fakking amazing. Do you maybe have a playlist for me with a brunch more from your amazing taste in music?
11:03 Graphs were made popular in a book written by William Playfair... "Will I Play Fair?" Given how easily graphs can say whatever we want, it all seems like some kind of cosmic joke.😮🌠😅
That unexplained scatter graph at the end was indeed very offensive
9:56 oh shit yeah it was way way too cold, that's a no brainer
ISTR that Tufte has also made the argument that bad powerpoint design contributed to the decision to have Columbia re-enter after the foam strike, with the most critical information listed in the smallest bullet points.
It's almost like NASA needs someone to take the findings from the engineers and present them to the managers so the engineers don't have to. Someone with people skills. A bit like the guy from Office Space.
The visual at 6:50 could at least have been sorted by ambient temperature. Then, the booster figures would get junkier as one would go from one side of the graph to the other.
Excellent video!
That preattentive list is really interesting.
Very helpful, thank you! :)
The intro song slaps.
5:29 Tufte has some very good ideas, but for me his ideas sorta go into "Good flag bad flag" type thinking at times, and aren't always productive. I think his stuff, plus the work of William S Cleveland make for a good basis to set a person up for success with good data viz, and also trusting your gut enough to ignore their guidance when it seems merited to do so.
Log scales are ok when graphing the frequency content of a sound signal I think. Humans perceive pitch more or less logarithmically so a log scale really helps with making the visualization match with what the signal actually sounds like. Same kinda goes for loudness really.
good habibi funk
I looked this up recently: the highest point in kansas is a little privately maintained park on a biiig hill.