There’s been a great furor recently about a study which purports to show that rats fed GM corn develop more tumors than rats fed regular corn. I’m actually a bit late to this party; scientists and science writers across the web have already picked apart the flaws in this study, from shoddy statistics to poor design, and Carl Zimmer has called the whole thing “a rancid, corrupt way to report about science“. I don’t have much to add to the chorus; what I’d like to do with this post is to make clear to the layperson what we mean by “bad statistics” and why that makes the study unconvincing.
In case you haven’t heard about this study, I’ll provide a quick summary. The aim was to assess the long-term impact of genetically modified (GM) food; to do this, the research fed rats a diet with and without GM corn over their entire life (2 years) rather than the shorter period (90 days) used in previous studies. The GM corn used was engineered by Monsanto to be resistant to the herbicide Roundup, so the researchers also tested the effect of diet of GM corn which had been sprayed with Roundup. The rats being fed GM corn were further divided into three groups, one with a low dose of GM corn (11% of the corn in their food), a medium dose (22%) and a high dose (33%). The researchers monitored the health of the rats in various ways, including blood and urine tests and regularly checking for tumors. Their main finding was that the GM-fed rats died earlier and had more tumors (and other health problems) than the rats fed regular corn (the “control” rats).
I was intrigued when I heard about this study. I read the paper without any particular hostility to the idea. While reading, though, I was quite surprised when I realized that the team had used only 20 rats (10 of each sex) in each group. I’m not a statistician, but based on my experience as an evolutionary biologist that struck me as too low a number to be able to make any strong conclusions, especially when there were so many different treatment groups. It just seemed like the study had been too ambitious. I was also concerned because the study wasn’t double-blind, a practice I was trained to consider de rigueur in such studies. These doubts were enough to prompt a search to see what others had said. My misgivings were confirmed when I came across a post by Andrew Kniss at Control Freaks, where he elegantly demonstrates the problems with using so few rats.
I’m not going to go into the argument in depth but rather try to make it accessible to the lay reader. Basically, when you do an experiment you’re trying to test an idea you have about the world — a hypothesis. You design an experiment and then gather some observations. The crucial issue is whether your observations — the results — actually support the hypothesis or if they could just be due to dumb luck. For example, I might claim that I could predict if a coin would come down heads or tails. If I tried to convince you by flipping a coin once and guessing correctly, you’d be underwhelmed by my argument (and rightly so). On the other hand, if I could guess correctly 5, 10 or 15 times in a row, you might start to wonder. That intuition is the crux of statistics and what we mean when say that results are “statistically significant” — could it be that they just happen to look right by chance? Statistics is how we can formally, accurately and quantitatively answer that question.
Of course, science experiments are usually much more complicated than flipping a coin, so the statistical analysis is more complicated. The basic idea, however, remains the same. Just like in the example of flipping a coin, it’s important to repeat tests and do them on large groups so we can be sure that the results aren’t just due to luck. That’s what this study failed to do. In his post, Andrew Kniss claims that it’s possible to randomly pick several groups of 10 rats and get results similar to the study even without treating the rats differently. In fact, he doesn’t just claim so — he provides computer code to demonstrate his point. Since the argument might be a bit technical for some folks to follow, I decided to use his code and present the results graphically to help make the point clear. The article presents lots of health-related data, but for simplicity’s sake (and because the data aren’t entirely clear) I’m going to focus on the part that seems to have gotten the most attention: the tumors in female rats. The other data suffer from the same problem; the fact that the rats were sick in lots of ways doesn’t make much difference since those aren’t independent observations — rats that develop one problem might be likely to develop others, too.
Basically, I told the computer to make a population of rats which each have a certain chance of getting a tumor. Then I simulate the experiment by randomly creating seven groups (no GM corn, three doses of GM corn, three doses of GM corn plus Roundup) from this population, each with ten rats. The crucial thing is that, since this is a computer simulation, we know that there is absolutely no difference between the seven groups. I’ve labeled them the same way as in the experiment to make things look similar, but in fact the seven groups in the simulation are identical in every way, so any patterns cannot be due to the effect of GMO food (or anything else). I repeated this simulated experiment 10 times and made graphs from the resulting data. Here’s how the data from the real experiment (in green) compare with the simulation results (in blue):
The point isn’t that any of these graphs are exactly the same as the data from the experiment. Rather, the point is to show that those results aren’t particularly exceptional. In four of the simulations (#1, #2, #3, #5), the control rats have fewer tumors than most of the GMO-fed rats. Again, this is what we would observe even if there was absolutely no difference between the “treatments”. In general, there’s enough variability that we can’t say for sure that the actual results aren’t simply due to chance — that is, that they’re significant. The experiment uses too few rats to be able to make any conclusions. In fact, I was struck by the fact that the paper doesn’t seem to make any claims that the results are significant, which is something that would normally be done in this kind of study. Perhaps they thought the results obviously supported their hypothesis. If so, they were wrong.
Does this mean that eating GM food won’t cause tumors? No, that’s absolutely not what it means. All it means is that this experiment wasn’t good enough to tell us anything about the relationship between GM food and tumors. In my opinion, GM foods should be no more unsafe than other foods; however, I do have a distrust of large corporations like Monsanto and I think regulatory capture is a real and very troubling problem. If there really haven’t been any studies on the long-term effects of GM food, I think they should be conducted. Unfortunately, though, this paper hasn’t added very much to the debate. As sensational as it might seem, it hasn’t actually advanced our scientific knowledge of the effect of GMOs. It has, however, provided a powerful rallying point for partisans on both sides of the argument.
Andrew Kniss on Control Freaks: Why I think the Seralini GM feeding trial is bogus
Séralini GE, Clair E, Mesnage R, Gress S, Defarge N, Malatesta M, Hennequin D, & de Vendômois JS (2012). Long term toxicity of a Roundup herbicide and a Roundup-tolerant genetically modified maize. Food and chemical toxicology : an international journal published for the British Industrial Biological Research Association DOI: 10.1016/j.fct.2012.08.005
The R code I used and the output.