A common fallacy prevails in biology, economics, and other sciences that use statistical techniques such as regression analysis or analysis of variance in order to determine the presence or absence of correlations between variables. This error is the belief that it is unnecessary to determine the measurement error of individual data points, since the inclusion of error bars would, if anything, strengthen the correlation. Not only is this opinion wildly counterintuitive, implying as it does that larger measurement error would make a better data set, but it is at odds with demonstrable principles of error analysis widely known and acknowledged among mathematicians and physicists. I hope to make plain the fallacy in graphical terms, contrasting the “correlation coefficient” valued by biologists and social scientists with the chi-squared analysis and resulting levels of confidence computed by physicists. From this exposition, we will see that quantitative analysis in most scientific disciplines are of dubious legitimacy, and their genuine worth cannot be determined without making a serious endeavor to estimate measurement error and random statistical error.
In experimental physics, it is axiomatic that an experimental value is meaningless if it is not accompanied by a measurement of its uncertainty. If the error is not specified, for all we know, the error might be larger than the value itself, meaning the experimental result would be consistent with a “true” value of zero. Even for smaller uncertainties, specifying the error is essential to determine how likely it is that any measured correlation is not an accident of random statistical error. We can explore this concept in detail in its turn, but for now let us consider the perspective of the less mathematical sciences.
Consider two variables, X and Y, where X can take integer values and Y is a continuous variable. For simplicity, we assume X, the “independent” or “predictor” variable, to be known precisely without error. For each value of X from 1 through 8, we measure the value of Y for ten objects, for a total of eighty measurements. We plot the mean value and statistical standard deviation error for each of the eight groups of ten measurements:
Typically, a social scientist or biologist will attempt to determine from this data whether or not the two variables are statistically correlated. In particular, he will want to know whether variation in the “dependent” or “response” variable Y is attributable to the value of X. This is determined by an analysis of variance (ANOVA) test, which compares the variance of the observed mean of Y between data groups with the variance of the observed value of Y within each of the (eight) data groups. If the “between” variance is larger than the variance “within”, then we can reject the null hypothesis, namely that the mean of Y is independent of X.
The above analysis does not take into account experimental measurement error in the values of Y. We are implicitly assuming that such error is negligible compared to the observed variation within each data group. Our rejection of the null hypothesis is based on the elimination of random statistical variation as a possible source of the variance in the mean of Y between data groups. If our analysis is to be a reliable statement about physical reality, we need the further assurance that experimental measurement error is not a possible source of the observed variance between groups.
Researchers who use ANOVA without specifying the experimental error sometimes pretend to justify this omission by saying that the inclusion of any nonzero error bars would actually strengthen the correlation. This assertion is derived from regression analysis, which computes a “correlation coefficient” that is the square root of the fraction of the variance in the dependent variable attributable to variance in the independent variable. (Here we are dealing with a simple X versus Y plot, with one measurement at each X value, rather than averaging groups of measurements, as above.) For example, if the correlation coefficient is 0.8, then 64% of the variance of one variable is statistically correlated to variation in the other variable. It is true that in regression analysis, adding error bars strengthens the correlation, but that is because such analysis does not demonstrate correlation between the real physical properties behind the data, but merely considers the data divorced from any latent reality and is thus unconcerned with experimental accuracy. The idea that having greater measurement error would improve the regression fit and weaken the null hypothesis defies common sense to a physicist, who associates large error bars with bad, meaningless data. Let us look at our previous data set, and apply extremely large error bars:
We will consider this graph two different ways: first, as a simple X-Y plot of a single data point for each value of X, with bars showing the measurement error in Y; and second, as representing the mean values of Y over eight samples of ten Y measurements for each value of X, with bars showing the error in the mean, as magnified by the measurement error of each data point in the sample.
In the first case, we can apply regression analysis and obtain a high correlation coefficient. Since regression analysis assumes the error bars depict purely random (normal distribution) variation in the data points, the existence of a positive correlation among wildly varying quantities is a priori less likely, therefore the observed presence of such a correlation is all the more remarkable and thus less likely to be the result of random error. Of course, a physicist would take one look at this data and pronounce it worthless, as it is utterly indistinguishable from the null hypothesis, since we can easily draw a flat trend line that remains well within the measurement error of all data points. Both perspectives are correct, but they are addressing different questions. The social scientist or biologist who thinks large measurement error is not a liability is addressing a mathematical question while believing to address a physical question, while the physicist really is appropriately treating a physical question.
The physicist takes seriously the possibility that his data may be wrong, and accounts for this with measurement error bars that typically encompass a full standard deviation of probability. The “true” measurement that would have resulted if life was perfect is probably within that error bar, though the probability is not necessarily uniformly distributed through the error bar, as there can be asymmetric error. Thus, for the first data point above, it is possible that the true value of Y is twice as likely to be in the interval from 4 to 6 than in the interval from 0 to 2. Such a determination would require more sophisticated analysis, and may not be possible in many cases. For the most part, physicists are content to represent a standard deviation error above and below the data point, with unequal lengths above and below the point if the error is asymmetric.
A physicist would next perform a chi-squared analysis determining the likelihood that the curve fit of the data points is in fact a real correlation between the physical variables. In order to know this, we must be sure that our data points are close to the real values of the physical variables, so our correlation strength is weighted against the inverse square of the fractional error (ratio of the error to the measured value) of the various data points. Thus a greater error will result in a poorer fit, since there are many more possible curves that could describe the data. In our extreme case above, even a flat trend line would be consistent with the data.
The less critical biologist or social scientist might bypass all this error analysis, accepting the data points as given, and mathematically analyzing the correlation of variances. The variance of each variable is computed simply by squaring the deviation of a data point from the mean. This is purely abstract mathematical analysis, and takes no account of the possibility of physical measurement error. Regression analysis addresses a purely statistical question: What is the likelihood that the correlation in variances is attributable to random variation in the values of the data points? This random variation is not a comparison of the measured data point to its real value, but a comparison among the measured values of the data points. A strong correlation coefficient simply means that it is unlikely that the measured correlation between values is the result of random statistical noise among the measured values of the data points. This takes no account of any possible measurement error (random or otherwise) within each data point.
The biologist’s analysis, from an empirical perspective, is not only incomplete for ignoring measurement error, but also answering a different mathematical question than the physicist who does chi-squared analysis. The physicist is concerned with the “absolute” (within the framework of the experiment) values of the variables and their formula of correlation, while the biologist or sociologist is perhaps only concerned with showing the existence of a correlation between variables, irrespective of whether the absolute values are meaningful. In cases where the absolute value does matter, the use of regression analysis as a substitute for error analysis would be thoroughly inexcusable. Yet even when our concern is restricted to demonstrating the mere presence of a correlation, we have found that this analysis is empirically inadequate, as a potentially large measurement error would obliterate any measured correlation.
The observation that large error bars actually increase the correlation coefficient is a reflection of the correlation coefficient’s abstraction from empirical reality. Assuming that the data points are correct, regression analysis is a useful tool for determining what fraction of the variance of one variable is correlated to variance in another. Introducing error bars will improve the correlation coefficient because the correlation coefficient was never intended to be a measure of how likely the correlation corresponds to reality. It simply correlates the measured variance of one variable to the measured variance of another, without considering whether these are “correct” with respect to physical reality, a determination that only error analysis can make.
Now, let us return to our last graph and this time treat it as representing the mean values of Y over eight samples of ten Y measurements for each value of X, with the error bars showing a large error in the mean of each sample, reflecting the large measurement error of each data point within each sample.
Here, if we apply the ANOVA test, we will fail to eliminate the null hypothesis, since the variance between groups is less than the modified variance within groups. When we increase the variance of each sample from the simple statistical variance to the full mean-square uncertainty including measurement error, we risk exceeding the “between” group variance, and failing the test. There is nothing unsound about the ANOVA test, as here it correctly retains the null hypothesis, but it can be misapplied when researchers presume that they are free to use the statistical sample variance without reference to the measurement error.
If the error analysis used in much of the biological and social sciences is so slipshod, and in some cases, practically nonexistent, we may ask why these fields of research are able to produce fruitful results. First, we must note that there is in fact a serious problem with quality and accuracy among peer-reviewed articles, though experienced researchers have the astuteness to distinguish the wheat from the chaff. Second, in many cases, the measurement error truly is negligible compared to the statistical variance, so that there is little penalty in accuracy for neglecting the measurement error, especially when the researcher is content to simply eliminate the null hypothesis, rather than to quantify the correlation with great accuracy. Still, it is bad practice to assume that the measurement error is negligible, and researchers should take care to quantify all potential sources of error as best as possible, and to incorporate this error analysis into the variances subjected to statistical tests. Although this can be mathematically cumbersome, biological and social scientists have increasing access to the expertise and software necessary to apply more sophisticated mathematical analysis to their research, so there is no reason why these fields cannot uphold the same level of rigor as physicists in the area of error analysis.
© 2007 Daniel J. Castellano. All rights reserved. http://www.arcaneknowledge.org