In a recent discussion on Biofortified, a conversation regarding the ability of small scale research plots to represent real world results was raised. For reasons of experimental control, practicality, and economy, the majority of agricultural research is carried out at smaller scales, i.e. through growth chambers, greenhouses, and small field plots. Almost uniformly, the results of such studies are extrapolated to larger “field” level scales for reporting purposes. While this translation may seem like a straight forward conversion, it can have considerable affects on the interpretation and inference made from the research. Specifically, it is important to understand how error rates at the small scale carry over and affect the larger scale results.
In this post, I will use a journal article cited in the discussion above (Elmore, et al. Glyphosate-Resistant Soybean Cultivar Yields Compared with Sister Lines, Agron. J. 93:408–412, 2001; Accessed from: http://digitalcommons.unl.edu/agronomyfacpub/29/). This article examined the effects of genetically engineered herbicide resistance on production characteristics of several soybean varieties. While multiple varieties and herbicides were considered in the research, those of interest here are the lines genetically altered for glyphosate resistance (GR) and their corresponding “sister” lines which were genetically similar with the exception of having no glyphosate resistance (Non-GR).
Table 5 in the article presents average comparisons for these two varietal groups. The yield of Non-GR lines is given as 3.68 Mega-grams (Mg) per hectare while that of GR lines is 3.48 Mg per hectare. Statistical significance between these two groups is not explicitly shown due to an apparent typographic error in the table (no letter designation is given for the GR group). A standard error (SE) of 0.08 is reported, however, it is not clear whether this represents the error of the means themselves or the error of the mean comparison, 3.68 – 3.48 = 0.20 Mg per hectare. In order to proceed with this discussion I will assume that the standard error is for the contrast itself. This appears to be consistent with the presentation of other tables in the paper and is the “best case” scenario for the researchers with respect to the variability of the data. From here, we can see an approximate 95% confidence interval on the difference in means (roughly the difference ± 2*SE or 0.04 to 0.36) would almost, but not quite, cover zero. This implies some significance, although marginal. Still, the difference of 0.20 Mg or 200 kg per hectare would not be negligible to a producer, especially when compounded over several hectares. Perhaps this is a case where statistical tests belie a real practical difference, as recently covered on Biofortified by David Tribe in GMO statistics Part 10: the King of Hearts is NOT equivalent to the King of England .
At this point, it is useful to consider how the experiments were carried out, what was actually measured, and how the data were collected. In the methods section we find that the soybeans were grown in typical field plots measuring 4 rows wide and 9.1 m (30 feet) long. The rows were on a standard soybean production spacing of 0.76 m (30 inches). In order to provide a buffer from adjacent plots, only the center two rows were harvested. It is not mentioned if a buffer or border zone was used between adjacent plots within a row, however, I will assume a 5 foot border here as this is similar to standard practice in such studies and does not greatly affect the demonstration given here. This leaves us with an effective plot size of 60 inches x 25 feet or 11.613 m2. Through the magic of metric conversion, it turns out that the yield numbers reported in Table 5 also represent the units of 100 g per square meter. This, of course, leads us to plot level yields. For the example above, the difference in varietal groups, 20 per square meter, is equivalent to 232.3 g per plot. Conveniently, the authors have also supplied us with seed size information in Table 5, thus, assuming an average seed size of 0.144 g per seed, the difference observed was approximately 1613 seeds.
This still seemed fairly substantial, but was difficult for me to visualize. To satisfy my curiosity, I took a trip to the local Coop and picked up some soybeans. From these I determined that 1600 seeds is equivalent to approximately 420 ml (~1.7 cups) in volume or about what you could hold in two hands. Using similar computations, the 2*SE used in the confidence interval above translates to about 1300 seeds or 340 ml (~1.4 cups). A cup and a half of seed is translating to a potential difference in metric tons between varietal groups at the production level. How is this happening? First consider the process of plot harvesting. The paper states that a small plot harvester was used for this purpose. For those not familiar with these machines, they are usually scaled down, car sized versions of full size combines. They are complete replications of larger machines having a sickle bar cutter and reel to collect plant material which is then passed through the machine where the debris and chaff is separated from the seed. An operator sits on top controlling the direction, speed, cutter height, etc. Often, a second person will ride or walk alongside the harvester catching the seed from each plot in a bucket or grocery sack. Harvesting is a dirty, dusty business subject to human failures. It is not hard to imagine the loss of a cup or two of seed over 25+ feet of plot during this process. Seeds can be dropped, missed, shattered to the ground by the cutter/reel, or simply blown out the back with the chaff if the settings on the sieves are incorrect. Care must also be taken to pause the combine between plots in the border zone (typically mowed down prior to harvest) in order to allow the combine to finish thrashing and processing the plot material. Matters can be further exacerbated if the seed from each plot is run through a seed cleaner prior to weighing. Of course, on top of all this, there is variation due to spatial location and arrangement, micro-climates, etc. These are all sources of variation that skilled researchers strive to minimize. The problem with scaling small plot results to full scale production levels is that the errors encountered in plot harvesting either do not occur in full scale scenarios or, when they do occur, they do not scale up proportionally. The proportion of seed missed relative to the total amount taken in, for example, can be much higher for a small machine compared to a full size machine. Small scale spatial variation is much more influential on small plot measurements compared to those taken across a wide area. A common way to measure these differences is the CV statistic or coefficient of variation, which is the ratio of the response variability to the response mean. In full scale production, this ratio is typically much smaller than the corresponding values from small scale research. In other words, small variations can have a large influence in research data, but similarly scaled errors are not likely to occur at the field level.
To be clear here, I in no way mean to be critical of these particular researchers. By all accounts they have carried out a set of designed studies to the best of their abilities. I picked this article because of its relevance, convenience, and reported information. It should be generally noted, however, that research methods have their limits in resolution and these limitations can translate to apparent large real world differences. The interpretation of such differences should be considered with caution.
So, given these difficulties, of what use are small scale studies? How should we interpret their findings? While I hope I have shown that we should use caution and common sense when extrapolating to field scale levels, small scale studies have much more value than that. Interpretation of results within a study, whether re-scaled or not, is always important. Ranking and comparison of treatments, varieties, or other experimental effects are usually unaffected by these problems (see for example http://www.colostate.edu/depts/prc/pubs/ComparisonOfLargePlot_KL.pdf). Small scale trials also allow researchers to control for outside influences such as environmental conditions, in order to more accurately measure experimental treatment effects. They are indispensable for “proof of concept” experiments where the objective is to isolate a given process or test a specified hypothesis. As part of the scientific method, small scale studies play an important role helping researchers refine and define their research problems. Often these results are then expanded to larger scale trials, thereby encompassing a wider array of potential variation and allowing better assessment of their viability in the real world.
Too often, however, the initial small scale results are picked up, extrapolated, and used by interested parties without consideration of potential problems. Consider the conclusion drawn here by the authors: “Glyphosate resistant sister lines yielded 5% (200 kg ha-1) less than the non-GR sisters (GR effect).” This conclusion has also been cited as definitive evidence of yield drag in the widely distributed report Failure to Yield by Doug Gurian-Sherman (Union of Concerned Scientists). Yet, we have seen that this difference could have been as low as 40 kg per hectare. Reporting or interpreting the field level extrapolations without acknowledging the variability is misleading. A slight increase in the variability (0.02 Mg or ~160 seeds) would have led to a conclusion of non-significance. Stating that a difference was found is fine, but stating unequivocally that a loss of 200 kg per hectare can be expected is not. As consumers of research results we must be aware of the limitations regarding small scale trials and correctly assess the interpretations of extrapolations we make from them.