Studies Show Most Do Not Understand Statistics

In this election season, repeated citations of polls provide reminders of how little even most educated people understand about statistics. I should like to review a few basic errors that cause most people to overvalue the accuracy of polls and other studies based on statistical samplings and correlations.

Journalistic polls often state a “sampling error” of 3 or 4 percentage points. This sampling error is a measure of the statistical error resulting from taking a sample of several hundred or several thousand random people out of the entire population represented. It does not include other sources of error, such as systemic sampling bias resulting from favoring, say, urban over rural respondents, women over men, etc. Thus the total error of a poll is usually more than the stated sampling error. This is why voter exit polls turn out to be inaccurate more frequently than their sampling error would indicate. If the error were truly 3 percent, we would expect the poll to be accurate within that margin of error two-thirds of the time, following a normal distribution.

Understatement of the error is also common in economics. Recently, former Treasury Secretary Robert Rubin opined that the current financial crisis was a “low probability” event, following conventional economic models. However, as Benoit Mandelbrot has pointed out, conventional economic models of asset valuations substantially underestimate risk, since they assume a normal Gaussian distribution of variations in price when a Cauchy distribution would be more accurate. Higher mathematics aside, we could gather as much when we consider that “low probability” events occur with remarkable regularity and frequency. Rubin’s understatement of error in his economic model leads to a tragic failure to appreciate that there may be systemic reasons for our propensity for bubbles and busts; instead, he regards the crisis as a freak occurrence.

Worse still is when polls are advanced to support claims for which they may have little relevance. Telling us that a majority of economists support Candidate X is not an economic argument for Candidate X, any more than a majority of physicists supporting Candidate X would prove the candidate is good for physics. If anything, it tells us about the political affiliations of economists or physicists, which is sociological data, not a scientific argument. Hard science does not work by taking polls of scientists, but demands that reasons be produced for a position.

Medical studies are often interpreted by journalists to prove causality when they only show statistical correlation. A good rule of thumb is to never assume causality unless a clear aetiology can be shown. Here, common sense may serve as an adequate substitute for mathematical expertise. When the consensus on medical wisdom constantly changes in a matter of decades, we can be sure that the facts were never as firmly established as originally claimed. Medical studies understate their errors by failing to take into account measurement error and systemic error in their statistical analysis. Further, they usually show correlations or “risk factors” without demonstrating causality. For these reasons, the certitude of medical wisdom should be viewed skeptically. Lastly, the claim “there is no evidence that X is dangerous” can simply mean no adequate study of the matter has been done.

In all these cases, a healthy skepticism combined with common sense can guard against most statistical fallacies, even when mathematical sophistication is lacking. Mathematics, after all, is wholly derived from intuitive, rational principles, so it cannot yield absurd results. When a presentation of statistical results seems completely contrary to reality, it is usually a safe inference that there is a wrong assumption underlying the analysis. Even sophisticated statisticians can err, though they calculate impeccably, if they misconstrue the assumptions or conditions of the question they believe they are answering. When studies claiming 90 or 95 percent accuracy prove to be inaccurate more than 10 percent of the time, it doesn’t take a mathematician to realize that there is a lot of overclaiming in the soft sciences.

Update: 29 December 2008

To give a current example of misleading statistics, a new study claims that teens making abstinence pledges are no less likely to have premarital sex than those who do not. If that sounds counterintuitive, it is because it is not true. Pledgers indeed are less likely to fornicate, but the current study decided to control for factors such as conservatism, religion, and attitudes about sex, and compared pledgers and non-pledgers with similar characteristics. Unsurprisingly, this yielded no difference, since the pledge itself does not magically cause abstinence, but rather the underlying attitudes and values are the cause. This is a far cry from showing abstinence programs are ineffective. It would be like saying education is ineffective, but rather it is knowledge that changes behavior. Once again, competent scientists blinded by their biases can make inapt choices of groups to compare, and make interpretations that do not follow.

Why Feminists Are Bad at Math

The recent push to promote the notion of gender equality in mathematical aptitude, contrary to the overwhelming bulk of psychometric data, is itself ironically a demonstration of mathematical illiteracy. Just as it is horrible scientific practice to cherry pick studies indicating the desired result while ignoring all others, so it is terrible mathematics to make inferences about statistical variance from facts about the mean. Allow me to clarify.

Large-scale psychometric analyses have consistently found a stable discrepancy between males and females in cognitive test performance. The difference in the statistical mean or average is small, favoring men by about 2.5 IQ points according to the best analysis, but the most marked difference is not in the mean, but in the variance. There is significantly greater variance among males than among females. This means males outnumber females at both ends of the spectrum, so there are more male dullards and geniuses, a fact consistent with most people’s recollection of their classmates.

Distribution of general intelligence factor by sex; for math ability, the gender disparity is slightly greater, in both mean and variance.

The current Science study touting gender equality actually confirms that male variability in math ability is greater, by a factor of 1.11-1.21, consistent with results back in 1960 (variance factor: 1.20, mean difference 0.12 std dev, N=73,000 15-yr olds). This aspect of the study is downplayed by the media, even among scientific journals, since it leads to some politically undesirable facts.

As the authors of the Science paper admit, the gender discrepancy in variance means that at about two standard deviations above the male mean, there should be twice as many males as females. So if the threshold for performance in a science or engineering program was at two standard deviations above the mean, we should expect there to be twice as many males as females, based on mathematical aptitude alone. In reality, the most demanding physics and engineering programs only accept people three to four standard deviations above the mean, which would make the male to female ratio even greater, consistent with the 85%-15% male-female split in most top science and engineering programs. Harvard President Larry Summers made precisely this point in the infamous speech that cost him his job, at the instigation of feminist faculty who ironically displayed their own mathematical ineptitude.

All of this contradicts contemporary social dogma, yet is entirely consistent with common sense. It is far more credible that the consistent discrepancy in variability, seen across cultures and time periods, is the result of a real difference in aptitude rather than the product of discrimination. Indeed, gender stereotyping is more likely to take place in the home than at liberal institutions of learning, especially at the highest levels. If social factors were the cause of gender disparity, we should expect this to diminish as students progress to higher levels of university education, becoming further removed from their family’s influence and more engaged with socially liberal university culture. In fact, we see the exact opposite, as the gender disparity becomes more pronounced as students progress to higher degrees. Thus feminists are left with the absurd accusation that science and engineering departments are biased against females. Anyone familiar with university life should know better than to believe such nonsense, as faculty and administrators take special care to offer opportunities to females and minorities.

The disparity between male and female mathematical aptitude is barely noticeable at the median level. A little extra industriousness would be enough for females in the middle of the pack to perform equally with males in math class or even a bit better. In fact, in cultures where girls are academically industrious, we do see slightly better average grades for females in primary and secondary math class, probably owing to better study discipline, again consistent with common observation. However, at the highest levels, the aptitude disparity is too great for very many females to compensate with greater effort. The gifts of mental quickness and astute intuition are needed in order to do math at a high level with the facility required in a fast-paced working environment. The difference in gender variability may be used to calculate the expected male-female split among mathematicians in the National Academy, Fields medalists, and Putnam competition top performers. The computed values correspond closely with reality, comporting with the hypothesis that membership in these categories is indeed merit-based.

It is striking that the Science study still shows the same discrepancy in variability by gender, despite the fact that it uses the SAT, which was redesigned in 2002 to be less of an aptitude test and more of an achievement test. Verbal analogies were eliminated, since minorities performed poorly on these, and the math section now places less emphasis on speed and intuition, instead focusing on mastery of course material. This emphasis on achievement gives industrious females an advantage at the middle of the pack, accounting for the disappearance of the difference in mean, which still persists on true cognitive tests. Yet extra study is no substitute for genius, so there is still a pronounced gender discrepancy at the high end of SAT math performance, as the mathematically gifted can breeze through the exam with ease.

The only reason to protest these findings is political, not scientific. The gender disparity in math performance is no less well established than a reverse disparity favoring females at reading. No one questions the latter finding, in fact many feminists are proud to point to it, evincing their strange notion of equality. Similarly, race-based cognitive differences, which are even more pronounced than gender differences, even after controlling for socioeconomic status, are strictly taboo, unless perchance they are in favor of the supposedly oppressed minority.

This deep hostility to any finding that contradicts the contemporary myth of equality of aptitude across demographic categories is misguided. Gender- or race-based disparity in math and science aptitude is no cause for dismay or bigotry, if we understand what statistical statements about groups signify. We are making general statements about groups via statistical averages and variances; there will still be many individual women who do well in math and science, and even some geniuses. It would be a mistake to judge an individual based on that person’s demographic group; individuals are the basis of statistics, not the product of statistics. However, it is disastrous social policy to try to “correct” aptitude-based inequalities, not only because it results in unjustified accusations of discrimination, but because it may direct individuals away from the field they would have chosen for themselves based on their aptitude and inclination. Once outside of academia, graduates will find that performance is what matters, and they will be ill-benefited by having been protected or coddled by grade inflation or some other esteem-building measure to impose a false equality.

I understand that the desire to prove equality of aptitude by race and gender is motivated by the belief in the moral equality of all people, yet neither of these equalities implies the other. Even if we admitted that all races and genders were equal in every aptitude, with the same mean and standard deviation for all groups, we would still be faced with a real variation of aptitude among individuals within each group. What then of human equality? Does a genius have greater moral, social, or political rights than a dullard? If not, then it is clear that moral equality does not depend on equality of aptitude.

The confusion between equality of human worth and equality of ability can only come about in a society that values people primarily for their abilities. This instrumentalist notion of humanity, so unworthy of human dignity, can be a constant temptation for capitalist societies, where people are valued based on what they can produce. This perverse moral philosophy can be given a social Darwinist rationale, declaring that the only attributes with value are those that have some adaptive advantage. Only when we move beyond this crass instrumentalism will people be able to face their congenital inequalities with maturity and not be perturbed by them, nor use them as an excuse to lord over one’s fellow human being, for we have human worth for who we are, not what we can do.

How Not to Save the Planet

With an unseasonably cold spring comes a new wave of alarmism regarding global warming, of all things. The science of climate change is rife with political interest on both sides of the issue, so it should be useful to cut through the veneer of “science” (which is often just the ideologically charged opinions of scientists) to the actual facts that are known, and we will arrive at a picture that is quite different from either of the standard views on global warming.

First, we should observe that global climate and weather is a complex, non-linear system with far too many feedback parameters to solve analytically. All models of climate and weather involve probabilistic guesses and estimates of parameters based on past observations in similar conditions. This is why long-range weather forecasting is mathematically impossible, and even short-range forecasts are often wildly inaccurate. Estimating something like “global” climate can be a misleading construct, as the “global climate” is just a mathematical synthesis or average of quite disparate regional climates. A polar climate may become warmer while the climate of a temperate zone becomes cooler. Net global increase or decrease in temperature may or may not affect items of interest such as sea level rise or the greenhouse effect. Geographic distribution of climate is every bit as important as the overall average, arguably more so.

Since climate and weather models depend on previous data in order to estimate parameters, the science of global climate is essentially limited to data from the last half century. Earlier data is not truly global, and anything older than 200 years is almost certainly confined to Eurasia and America. Our estimates of long-scale historical trends are largely based on qualitative European accounts of rivers freezing, malaria outbreaks, and other indications of climate. On the geological scale, we can measure the carbon dioxide content of the ancient atmosphere, and though this is correlated to temperature, it is not the sole determinant of temperature.

Our knowledge of the historical and ancient past provides some apparently conflicting information, which often gets lost in the all-or-none approach to anthropogenic climate change. On the one hand, the carbon dioxide level of our atmosphere is higher than it is been in ages, almost since the time of the dinosaurs. Yet, despite this fact, the current global temperature is not dramatically hotter than that of recorded history; indeed the temperature of Europe was probably warmer during the medieval period. Even in the past century and a half of accelerated industrialization, global temperature has crept up slowly, in fact decreasing in the mid-twentieth century, before increasing only a fraction of a degree Celsius per decade. Indeed, it is debatable whether global temperature has risen so much as a full degree this century, such is the measurement error involved in a global average.

The synthesis of these observations points to a strange conclusion. Although human industrial activity is certainly correlated to a dramatic rise in carbon dioxide levels, this has not sufficed to raise global temperature as much as we might expect. This suggests that, were it not for human activity, we might still be in the “Little Ice Age” that extended from the sixteenth to nineteenth centuries. Global warming, what little there may be, is actually helping to keep the temperature reasonable. This is not necessarily a reason to be blase about future environmental change.

Thre remain serious concerns about ozone depletion and polar cap melting, which in theory could have catastrophic effects. Ozone depletion has already been addressed by a ban on cholofluorocarbons and freon, which are chemically capable of depleting ozone, assuming they actually rise to the ozone layer in sufficient quantities. Polar cap melting is projected to cause a rise in ocean levels by tens of centimeters, not meters, over the next century. This can cause serious crises in some regions, but there will not be any global cataclysm.

Built into the notion of “saving the planet” from some fictional calamity, be it a rogue asteroid or a global tsunami, is the fantasy that man can be his own savior, along with the equally vain belief that man is capable of destroying the planet. Global industrial activities have negligible climactic effects compared to, say, Krakatoa-scale eruptions that can cover a third of the world in darkness and cold. Man is but a caretaker of this planet, not its savior, and if he really wishes to improve the environment, he should try a different approach than the policy-wonking regulation of carbon emissions. Such regulations have only yielded the farce of exchanging carbon credits, and the ridiculously energy-wasteful enterprise of corn-based biofuels, which has driven up the cost of agricultural products in the United States.

Instead of creating a new industry of planet-saving, people should recall their role as caretakers and conservators and learn to consume less, not because they will “save the planet,” but because they ought to make good use of natural goods rather than waste them on frivolities. Yet as long as consumerism and frivolity are considered virtues, environmental do-gooders might as well try to empty the ocean a bucket at a time. Our wastefulness may bring no danger of destroying the planet, but it does threaten our existence as serious human beings.