Advancing meta‐analysis beyond simple parameter estimation

20161 citationsletterbronze Open Access

Authors

Samuel M. Scheiner · U.S. National Science Foundation

Abstract

In evolutionary biology, questions of general importance are rarely amenable to answers from single experiments or studies. Because we deal with living organisms and because evolution is fundamentally about change and diversity, answers come from aggregating information from many experiments or studies. Meta-analysis is a formal way to do such aggregating. Over the past 25 years, meta-analysis methodology in general and its use in ecology and evolution have made great strides (Koricheva et al., 2013) resulting in a substantial increase in the rigour of synthetic studies. That work, however, has focused on a narrow class of parameters (e.g. means and regression slopes) and a specific methodology. Morrissey (2016) shows that for some types of parameters, the standard methodology produces biased estimates. These other parameters include those that are central to evolutionary theory, such as the magnitude of selection coefficients and the shape of reaction norms. The standard methodology was developed with a goal of improving the precision of parameter estimates; Morrissey broadens that to considerations of estimation bias. His results in no way invalidate the previous work on meta-analysis methods; they merely show that such methods are not appropriate for all questions. The outcome should be a renewed focus on meta-analysis methods that expands our conception of what is considered part of meta-analysis and its toolbox of available techniques. Meta-analysis is a way to formalize information aggregation. It can be defined broadly, as by Glass (1976) who coined the term: ‘the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings’. Or it can be defined more narrowly, as by Koricheva & Gurevich (2013) in a recent handbook on meta-analysis: ‘a set of statistical methods for combining the magnitudes of the outcomes (effect sizes) across different data sets addressing the same research question’. The problem with the latter definition is that all questions are not about effect sizes, nor are all data aggregated from data sets addressing the same research question. I agree with Morrissey that meta-analysis should be considered to be broader than just a specific set of statistical methods. In aggregating data, we can ask a variety of types of questions. The meta-analysis tradition represented by Koricheva and Gurevitch is primarily about hypothesis testing: Does the mean effect size of some sort of treatment differ from zero? The meta-analysis of Morrissey is more about parameter estimation: What is the mean magnitude of the directional selection gradient in natural populations? A third type of question is about the frequency of occurrence of some phenomenon like the one we (Palacio-López et al., 2015) asked in a meta-analysis about adaptive plasticity: How often do plant traits show adaptive plasticity? That question is not about mean effect sizes. We do not expect reaction norm slopes to centre around some particular parameter values; just the opposite, we expect them to vary widely. The question is about the percentage of times that reaction norms are greater than a (admittedly arbitrary) threshold and show a particular pattern when compared among populations. Such questions are sometimes derided as ‘vote counting’ but doing so is a mischaracterization. Meta-analysis was developed, in part, as a way of advancing research syntheses beyond simply counting up the number of studies that reached statistical significance. For example, if one was interested in the importance of competition as a structuring process in plant communities, one could survey the literature for all studies that experimentally manipulated competition (e.g. growth alone vs. growth with another individual) and counted the ones that found a statistically significant effect. If, say, 25% were statistically significant, one would then declare that competition was an important process approximately a quarter of the time. That would be wrong and Koricheva and Gurevitch correctly condemn such an approach because it fails to account for variation in statistical power among the studies. Instead, they properly advocate that for each study an effect size be calculated that accounts for such variation and then those corrected effect sizes be aggregated and tested for a departure from zero. But standardized effect sizes can be used in other ways. In our analysis (Palacio-López et al., 2015), we determined the mean effect size that represented a one standard deviation difference among treatments and used that to determine the frequency of traits that fell into various categories (e.g. not plastic, adaptively plastic, nonadaptively plastic). Importantly, we also looked at the effect of changing that threshold and provided the bootstrapped cumulative distribution function so that readers could choose their own threshold. This procedure is a different, but equally valid, use of effect sizes. I emphasize that our study followed the other advice in Koricheva et al. (2013) about gathering and appraising data. All syntheses, whether testing hypotheses, estimating parameters or determining frequencies, are only as good as the data. Syntheses in evolutionary biology are subject to an aspect of data quality that differs from the common use of meta-analysis such as those in medical research. In those studies, there is a single phenomenon being analysed (e.g. the efficacy of various treatments for some disease) and studies are chosen for inclusion in a meta-analysis depending on the appropriateness of the research design. Performing a meta-analysis of the magnitude of direction selection is of a different nature because not only must the study be properly designed, but also the very existence of the data depends on what system the researchers decided to measure. Morrissey raised this issue in his comparison of the results of Kingsolver et al. (2001) and Siepielski et al. (2009). The latter analysis focused specifically on temporal dynamics and Morrissey points out that long-term studies are more likely to be done when selection has already been shown, or is strongly suspected, to be occurring. In general, meta-analyses in ecology and evolution are liable to researcher selection bias. Scientists tend to look in places where they expect to find interesting results. If you are interested in natural selection, you tend to start with traits that you think are under selection. If you are interested in phenotypic plasticity, you tend to start with traits that you think are plastic. One correction for this bias within individual studies is to measure many traits beyond the focal one(s). The nonfocal traits act as a control for the focal traits. For studies of antagonistic sexual selection, one should measure traits that are not expected to be under sexual selection. Applying Morrissey's suggested techniques to those data would show whether the putative sexually selected traits show differences as great or greater than the nonfocal traits. For our meta-analysis of the ubiquity of plasticity, we used two approaches to correct for such bias. First, we included all measured traits from each study and we included studies that were not originally focused on trait plasticity. Any conclusions from a meta-analysis in evolutionary biology must be tempered by considering to what extent the data are a representative sample. Morrissey's re-analysis of the data of Murren et al. (2014) solidifies our own conclusions. He found that with respect to reaction norms, most trait evolution occurs on mean values across environments, rather than on the parameters of the reaction norm. This result is consistent with our finding that trait plasticity is less ubiquitous than local adaptation. My informal polling of evolutionary biologists finds that most researchers think that phenotypic plasticity is very common. That misconception is likely due to two reasons. First, nearly all organisms probably have at least one trait that is phenotypically plastic. That does not mean that all, or even most, traits are plastic, nor that the plasticity is adaptive. But the existence of some plasticity leads to recall bias. Second, published studies of plasticity are almost always focused on traits that the researcher knew was plastic before starting the study. Otherwise, why bother to study it, resulting in study selection bias. Meta-analyses in particular and scientists in general need to guard against such biases. This manuscript is based on work done while serving at the U.S. National Science Foundation. The views expressed in this manuscript do not necessarily reflect those of the National Science Foundation or the United States Government.

Topics & Keywords

Species Distribution and Climate Change Plant and animal studies Insect and Arachnid Ecology and Behavior

UN Sustainable Development Goals

Life in Land

Publication Details

Published in: Journal of Evolutionary Biology

Volume 29, Issue 10, pp. 1912-1913

DOI: 10.1111/jeb.12944

Field-Weighted Citation Impact: 0.12