, , ,

Some time ago I shared some of my thoughts on science communication, including the importance of learning “to approach issues critically, to question and to reason, [so they would] have the wherewithal to challenge fixed beliefs and undermine authority”.  I’ve also written about things like cognitive dissonance and how our social environment can shape the way we conduct research or interpret the results.  In this post,  I’d like to highlight a few of the flaws you might come across in scientific research and what you should look out for when you hear about a new study.  I’ve picked out just three things you can check to help reassure yourself that a science story is on sound footing, but I hope people will chime in with more suggestions in the comments!

Double-blind: what it is and why it matters

Double blind studies are important because humans can be biased, often without knowing it. Scientists aren’t immune to this kind of thing.  A wonderful paper published last year showed an unconscious bias (by men and women) against women when evaluating candidates for a lab manager position. This isn’t a question of honesty or integrity; it’s about things like confirmation bias and implicit associations.

Unconscious biases can shape our behaviour in lots of ways, and sometimes that can interfere with the outcome of an experiment.  Clever Hans, the horse who could count, is a famous example of how we can unconsciously shape the outcome of an experiment.  Hans amazed crowds in early 20th century Berlin with his ability to count, spell, remember names and other amazing feats.  Hans couldn’t actually do any of these things, of course, but it took some effort to demonstrate that.  There wasn’t any trickery involved.  Hans’ keeper hadn’t trained him to respond to any special cues; anybody could ask a question and Hans would tap out an answer with his hoof.  A professor eventually showed that Hans could only answer a question if the person asking it knew the answer.  It turned out that Hans was picking up on the unconscious responses of the person answering the question — say, a slight shift in gaze when he’d tapped out the right number — and using those as cues.  If the person didn’t know the answer, he couldn’t accidentally give it away and Hans would have no cues to bias his behaviour.

A “double blind” study is one that takes just such an approach to eliminate (or minimize) the risk of unconscious biases affecting the results.  If you want to find out whether someone can tell the difference between Coke and Coke Zero, it’s obvious that they shouldn’t know which glass is which when they taste them.  A double blind study goes further by ensuring that the person giving them the glasses also doesn’t know what’s in them.  That way, there’s no risk that they’ll accidentally give away which glass has Coke Zero because of an unconscious difference in their behaviour — say, how they handle the glasses or how intently they watch the person while they’re drinking.  To make this work, you need a third person who randomly labels the glasses A and B and is the only person who knows which is which.  The actual experiment and data gathering should be done completely independently of this knowledge and only combined afterwards.  In some cases, people even chose to analyse the data without knowing which is which and only change the labels back after everything is done.

Sample size and statistical significance

It’s always important to make sure that a study is based on enough data to support the results.  What does that actually mean?  Basically, it means that there should be enough observations to convince us that the outcome isn’t just due to dumb luck.  If I flip a coin once and it comes up heads, that’s hardly going to convince you that the coin is weighted and always lands heads-up.  You probably still wouldn’t be convinced if I pulled it off three times in a row or maybe even five times.  But ten consecutive flips, all heads? Twenty?  That can’t just be dumb luck, right?  Something must be going on.

That’s really all there is to it.  Say you want to test the effect of a new fertilizer on plant growth.  Individual plants are always going to grow differently, so you couldn’t just use two plants, one with fertilizer and one without.  Any differences you find might have nothing to do with the fertilizer — they might be because of they way the plants were positioned or inherent differences between them or anything else.  You need to use more plants — a larger “sample size” — to know whether any effects are due to the fertilizer.  How many plants you need depends on how strong the effect of the fertilizer is — the stronger the effect, the fewer plants it will take to be convinced.

Scientific research should always be done with a large enough sample to make it unlikely that the results are due to chance alone and proper statistics should be used to confirm the findings.  In science, such results are called “significant”, which is different from the word’s normal meaning (something like “a great deal”); significant results are just results that are unlikely to occur randomly.  Look out for phrases like “statistically significant” or “significantly more/less/greater/fewer/differences” when reading about research.  If you don’t see them, be cautious — especially in the face of extraordinary claims.  An example of recent research that epically failed to do this is the dreadful paper from last year falsely claiming there was a link between GMO food and tumour incidence.  (Incidentally, the observations in that paper also weren’t double-blind, which also raised a warning flag.)

Correlation or causation?

It’s been said a thousand times, but correlation isn’t the same thing as causation.  The fact that two things happened to the same group of people or at the same time (or one after another) doesn’t mean that one causes the other.  It’s not just that it might be a coincidence — there are statistical tests to check that.  Even if two things always happen together and only happen together, there’s no guarantee that one caused the other.  There might be another factor underlying them that causes both to happen.  Nearly everyone who regularly drank water has died and nearly everyone who is dead regularly drank water.  Of course, it would be absurd to suggest that drinking water causes death.  Rather, the fact that we’re alive means that we have to drink water regularly and that we will eventually die.  This example is intentionally absurd, but the same logic applies when a study shows that eating X or doing Y is linked with higher rates of cancer or heart disease.  The two observations might co-occur, but that doesn’t necessarily mean that one causes the other.

It can be hard to show that one thing causes another, particularly since causality can be quite a thorny philosophical issue.  The ideal way would be to rewind the Universe and redo the observation with just one thing changed to see if the outcome is different.  Of course, that’s impossible to actually do, but scientists try to get as close as possible.  The idea is to repeat an observation many times and try to hold as many things constant as possible, so the only factor that changes is the one you think is causing an effect.  Since it’s impossible to make sure everything in the Universe is exactly the same, it’s always possible that some hidden cause will get missed, but these sorts of experiments are still more convincing than a simple correlation would be.

Keep an eye out while you’re reading and see whether there are convincing experiments showing that X caused Y or if there’s just a correlation between the two.  Correlations are important — they can be useful markers of risk, for example, or even be the first step in getting researchers to do experiments and show a causal link.  By itself, though, correlation simply isn’t good evidence of causation, so don’t let anyone get away with pretending it is.

Isn’t that what science writers should be doing?

Yes, science writers have a responsibility to evaluate research and present it to the public with an explanation of the various short-comings and so forth.  It’s far better, though, to have a critical and thoughtful public that can evaluate the information they get — whether from science writers, pundits, media figures, or politicians.  Ultimately, you are the only gatekeeper of your mind and the person responsible for what you do or don’t believe.  Science writers should do their job, but all of us should learn to read and listen actively and critically, especially in the face of extraordinary claims.

The main thing is to read and listen critically and to always ask questions and challenge ideas.  There are some subtleties I’ve glossed over and lots of other things to look out for, but I’ll stop there.  What about you? How do you decide whether or not to believe what you’re reading?