Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How Not to Fall for Bad Statistics
#1
There's a famous quotation whose origin is uncertain, but it was popularised by Mark Twain:

Quote:There are three kinds of lies: lies, damned lies, and statistics.

Perhaps this is a bit unfair: after all, statistics can be very valuable when used and understood correctly. However, it's very easy for people to cherry-pick, misuse, and misinterpret them in order to suit their particular point or narrative. As a result, anybody who blindly believes any and all statistics put in front of them is going to be misled around 75% of the time (I don't have the exact figures, so I just posted an approximate one that I'll admit I completely made up :P ).

So, how do you avoid falling for the bad statistics, like the one I just posted? The following Royal Institution video (presented by University of Oxford Department of Statistics Associate Professor Jennifer Rogers) offers some advice:



The video's a whopping 42 minutes long, so I don't expect many people to watch the whole thing without skipping any parts :P . Most of it is just examples, and there are only a few basic statistical principles being discussed here - so, here's a summary of some of those principles:

  • It's important to keep in mind the difference between absolute risks and relative risks. I'll illustrate with an example: suppose there's a hair-eating bug, which affects 3 in 1,000 hat-wearers, and 2 in 1,000 non-hat-wearers. That's an increase of 50% for hat-wearers (i.e. the relative risk); however, it's also an increase of only 1 in 1,000 (i.e. the absolute risk). A headline of "Wearing hats increases risk of hair-eating bugs by 50%" would sound much scarier than "Wearing hats increases risk of hair-eating bugs by 1 in 1,000" - and yet, both would technically be correct.

  • Correlation does not always imply causation. This entire point is probably worth a thread in and of itself, but it's well worth making a note of it here. To go back to my hat example, we know that hat-wearers are more likely to be suffering from hair-eating bugs than non-hat-wearers are - but, this doesn't mean that the hats are causing the hair-eating bugs. On the contrary, it could be that hats are believed to ward off hair-eating bugs - and therefore, areas with a high prevalence of hair-eating bugs also have a high prevalence of hat-wearing. (To drive the point home: you'll find way more sick people in hospital than out in the supermarket, but this doesn't mean the hospital is making people sick. These people are at the hospital because they're sick: not the other way around!)

  • Random fluctuations often produce extreme results that don't last. For example, there are typically several hundred air crashes every year: the exact number varies a lot, but for an 'average' year, it's pretty close to 500. However, 2017 turned out to be an extremely good year: the official statistics counted only 44 deaths. Sadly, in 2018, this didn't last, and the number once again increased to 556. This was reported in the media as "Sharp increase in air crash deaths in 2018" (and we had a topic about it here :P ), but in reality, it wasn't a cause for alarm or concern: it was simply a case of the figures reverting to the mean.

  • Beware of self-selection bias. This is common in opt-in surveys - where, for example, a "Were you satisfied with this product?" survey will be filled out mostly by people who thought it was very good or very poor (people who merely thought it was OK would probably not bother filling out the survey). Bonus points if your survey only gives a range of options from "Excellent" to "OK"... in which case, the people who hated the product probably won't bother with the survey either! (At that point, it's going beyond self-selection: that's the survey being used to manipulate the selection, so that only the satisfied customers bother to respond to it!!!)

  • Beware of small sample sizes. To go back to the hat example before... what was the sample size in the study? Perhaps they studied 100,000 hat-wearers and 100,000 non-hat-wearers - and they found 300 hat-wearers and 200 non-hat-wearers that had hair-eating bugs. However, what if they only studied 1,000 of each - and found 3 hat-wearers and 2 non-hat-wearers with the bugs? That's almost meaningless: it's impossible to draw any meaningful conclusions from that sample of 5 people with the hair-eating bugs, because the sample size is so small. All random samples are subject to a certain amount of random variation ('noise') - and, the smaller your sample size, the more likely it is that the noise will drown out the signal. (Or, put another way: those figures of '3 in 1,000' and '2 in 1,000', should have a confidence interval attached to them, to indicate the plausible range of random variation. The larger the sample size is, the smaller that confidence interval will be)

Still, I think that even in 42 minutes, this video merely scratched the surface: there's bound to be way more to say about those topic :P . So, if you've got anything to add, go ahead!
[Image: CJ_userbar.png]

Board Information and Policies
Affiliation | Coffee Credits | Member Ranks | Awards | Name Changes | Account Deletion
Personal Data Protection
| BBCode Reference

Lurker101 Wrote:I wouldn't be surprised if there was a Mega Blok movie planned but the pieces wouldn't fit together.

(Thanks to ObsessedwithBirds for the avatar and sig!)
My Awards
x1 x1 x2 x2 x5 x3 x1 x3 x6
My Items
Quote


Forum Jump:


Users browsing this thread: 1 Guest(s)