Methodological Problems in Epidemiology – Arcane Knowledge

As much of the world looks to slowly ramp down COVID-19 isolation measures, it remains unclear whether this global social experiment should be considered wise or foolish. The prevalence of infections is < 1% in every country in the world except the microstate San Marino. This is better than projected by most models, and could be interpreted as a success for isolation, an overestimation of the virus's infectiousness, or a natural seasonal effect. This question is not resolvable insofar as it depends on the counterfactual of what would have happened if isolation was not imposed. As mentioned in the last post, spread to 60% of the population with millions of deaths was never realistic. That alarmist scenario relied on a naive application of epidemiological models that have poor predictive ability. Using an SEIR model with the estimated parameters for COVID-19, one indeed gets a grim picture. Yet if one were to insert the parameters for seasonal influenza (R₀ = 1.3, avg. incubation period = 2 days, avg. duration of infectiousness = 5 days, mortality rate = 0.1%) into the same model, you would have over 40% infected and 150,000 fatalities in the first year, far more than what occurs in reality. The reproduction rate of a disease depends not only on the duration of contagiousness, but also the likelihood of infection per contact (secondary attack rate) and contact rate. These last two are highly variable by region, social structure, and perhaps even individual physical susceptibility.

Conventional compartmentalized models have poor predictive ability for seasonal influenza, as they do not account for other factors besides herd immunity and isolation that could slow the spread of disease. A Los Alamos study was able to create a model with parameters that fit to past seasonal data and should hopefully have predictive power for future seasons. Such an approach, however, is useless for novel pandemics. As the authors note, these models are all highly sensitive to choice of prior parameters, but we cannot know these until after the epidemic has run its course.

The problem of predictive modeling is exacerbated by the poor quality of public health data, which is often woefully incomplete or inconsistent, with categorizations often driven by policies or other unscientific criteria. Public health systems do a better job of recording the number of infected than they do for those exposed or recovered. Even here they are limited to those who seek medical treatment, and often diagnoses are made by symptoms rather than definitive tests. Cause of death on death certificates is driven by bureaucratically imposed standards. Even in scientific studies, researchers classify subjects according to one or another cause of death, and treat comorbidities as risk factors increasing the chance of death by the primary cause. It would be more rigorous to acknowledge that there is not always a single cause of death, and instead to treat comorbidities as contributing causes by factor analysis. This would let us know the mortality contribution of each disease to the population, but it would remain generally impossible to give a single “cause of death” for each individual.

Some parameters of COVID-19 are fairly well known at this point. The infected are contagious from 48 hours before showing symptoms to 3 days afterward. The secondary attack rate is surprisingly low, only 0.45% (compared to 5%-15% for seasonal flu). Thus the relatively high R₀ is attributable not so much to high contagiousness, but to the longer duration of contagiousness, especially while presymptomatic, so that infected people have more contacts while contagious than seasonal flu victims would. The 2009-10 H1N1 pandemic, by contrast, had a secondary attack rate of 14.5%, yet it infected 61 million out of 307 million in the US, just under 20%. It is implausible that COVID-19, with its much lower attack rate, could ever attain a comparable prevalence level.

Why, then, are the death statistics so much higher than would be suggested by the low infectiousness and low prevalence? On the one hand, many jurisdictions, notably New York, have decided to include so-called “probable” COVID-19 related deaths, and most public health data includes no serious attempt to account for comorbidities as causal factors, though they occur in well over 90% of fatal cases. On the other hand, the increase in deaths versus last year in many areas greatly exceeds even this high count, so it could be argued we are undercounting COVID-19 fatalities. The problem here is that many of the excess deaths could be caused not by COVID-19 per se, but by the overloading of medical facilities, resulting in less than immediate critical care. Some of these excess deaths might even be caused by the quarantine measures, as diagnostic and non-emergency medical visits have been cancelled.

It would not be uncommon for the number of deaths to be revised upward or downward by a large factor retrospectively. A year after the H1N1 pandemic, a study suggested that the deaths attributed to H1N1 ought to be revised 15 times higher. Whether H1N1 deaths were undercounted or COVID-19 deaths are overcounted remains to be seen, and is unlikely to be resolved, given the problems of data and methodology we have touched upon.

The truly frightening thing is that major public health policy decisions are made on woefully inadequate data and modeling, which will likely be radically revised after each pandemic passes, and the moment for decision-making is past. Public health officials will always err on the side of caution, but as we have noted in the previous post, this is not practicable for an indefinite period of time. At some point we must be willing to poke our heads out of our caves and assume the risk of living.

After all, as recently as the early twentieth century, people went about their business even while living under the threats of smallpox, polio, and measles, any one of which had higher infectiousness and fatality rates than the current pandemic. By objective criteria, there is nothing exceptional about COVID-19 as an infectious disease. What is exceptional is the post-WWII belief that life should be free from deadly risk, enabled by technological means to perform many service economy jobs remotely.