Common Pitfalls in the Interpretation of COVID-19 Data and Statistics
Policymakers, experts and the public heavily rely on the data that are being reported in the context of the coronavirus pandemic. Together with the proliferation of data, however, a number of pitfalls have arisen with regard to the interpretation of the data and the conclusions that can be drawn from them. These pitfalls have the potential to intentionally or unintentionally mislead the public debate and thereby the course of future policy actions. Here I list the most common pitfalls:
- The misuse of concepts that reflect the deadliness of SARS-CoV-2, which are the case fatality rate (CFR), the infection fatality rate (IFR), and the mortality rate (MR). Unfortunately, these different concepts are sometimes used interchangeably. In a nutshell, the CFR divides the confirmed deaths from COVID-19 by the confirmed cases of SARS-CoV-2 infections, while the IFR divides the same confirmed deaths by the total number of infections, regardless of whether these infections have been confirmed by testing. Given that many infections are asymptomatic and remain unconfirmed, the denominator of the IFR is much larger than the denominator of the CFR, resulting in a smaller estimated IFR compared to the CFR. The MR, in turn, divides the confirmed deaths from COVID-19 by the total population, resulting in an even smaller estimate. The CFR is easy to compute from the publicly available data but is heavily influenced by testing policies and capacities that differ substantially between countries. The IFR is less affected by these problems but requires reliable estimates of the total number of infections. These estimates can be derived from representative serological surveys. The MR provides more of a retrospective look once a new disease has run its course.
- Comparing incomparable data and statistics across countries. For example, the CFR of Italy has at virtually every point during the coronavirus pandemic exceeded the CFR of South Korea. A naive interpretation of this persistent difference could be that the virus has somehow been deadlier in Italy than in South Korea for unknown reasons. Such an interpretation overlooks the comparability of CFRs across countries. Age of the confirmed cases is the most important characteristic for comparability, given the overwhelming evidence that the likelihood of surviving COVID-19 is substantially lower for older patients. In Italy, cases are concentrated in the high-age and hence high-risk groups, as 38% of all confirmed Italian cases are at least 70 years old. By contrast, the confirmed cases in South Korea are distributed more evenly across age groups, with only 10% at least 70 years old. Consequently, the confirmed cases that enter the calculation of the Italian CFR are likely to end in death much more often than in South Korea, resulting in a higher death count and hence a higher CFR for Italy than for South Korea.
- The COVID-19 death count. In general, countries use different systems and classifications for recording deaths by COVID-19. This has led to concerns about whether officially published COVID-19 death statistics accurately reflect the impact of the pandemic. One way to address these concerns is to look for excess mortality in a given country that is known to have experienced a major coronavirus outbreak, meaning additional deaths above the norm due to COVID-19. According to calculations by the National Statistical Agency of Italy (Istat), deaths in Italy increased by 39% or 25,354 in the first three months of 2020 compared to the average of the five previous years. However, only 13,710 deaths have been recorded as COVID-19-related over the same period, which explains only 54% of the observed excess mortality. Hence, deaths from COVID-19 may have been severely undercounted in Italy despite Italy’s already high reported death toll.
- Lags in data reporting. Reporting lags occur, for example, when decentralized offices and institutions do not meet their deadlines for reporting their data to a national agency that then processes and publishes the collected data. Reasons for such non-compliance can be high workloads during an epidemic, or local bottlenecks in testing capacities. Reporting lags become visible only when updates and revisions to the data are published. Statistics Sweden has been very transparent with regard to such revisions: For example, their data release from 6 April reported a total daily death count of 157 for 1 April. However, the data release from the following week revised this initial death count for 1 April upwards by almost 100% to 308. The subsequent releases settled the total death count at 324. Hence, it is important to keep in mind that very recent data are often incomplete and subject to substantial revisions. They are therefore not adequate for immediate use in policy evaluation.
- Sample selection bias. Most often, the data collected and analyzed in the context of the coronavirus pandemic do not represent random samples of the overall population. A consequence of using selected samples is that the insights obtained by means of statistical analysis do not generalize to the overall population. This applies in particular to the various serological samples, as recruitment into the samples often raises concerns about selection bias. The over- or underrepresentation of certain risk groups together with the statistical uncertainty of the rather small serological samples may result in severe misjudgments about the true prevalence of antibodies in the population. Importantly, sample selection bias is not related to sample size. Hence, increasing the sample size by simply collecting more data will not eliminate the selection problem if the underlying mechanism that governs the selection into the sample is not addressed.
- Endogeneity of policy interventions. When attempting to evaluate the effectiveness of the various lockdown strategies implemented by governments across the globe, one question that has to be considered beforehand is: Why has a specific country chosen a specific lockdown strategy in the first place? For example, Italy and Spain already had relatively high death tolls from COVID-19 when they decided to enact stringent lockdowns. Hence, the occurrence of stringent lockdowns is correlated with high early COVID-19 death tolls. High and early COVID-19 death tolls, in turn, are correlated with even higher death tolls as the pandemic progresses within the country. A stringent lockdown policy will hence appear relatively ineffective because it had to be implemented in countries already in a critical situation. This endogeneity problem invalidates any simple comparison of lockdown strategies and outcomes.
To sum up: First, when utilizing different rates and measurement concepts, they must be understood, properly defined and appropriately distinguished. Second, when performing comparisons even of the same measure or rate across countries or contexts, one must ensure that the underlying data are sufficiently comparable. Third, if there are doubts about the accuracy of the data collected in the specific coronavirus context, other, independently collected data can serve as a tool for validation. Fourth, caution must be applied when interpreting data releases as final or even real-time information because they are frequently revised. Fifth, any interpretation of data and statistics must take into consideration whether selection bias might have affected the collection of the underlying sample. Finally, when comparing policy outcomes between groups one must be aware of underlying factors that may have determined both the policy choices and the outcomes.
Link to full open-access publication: https://link.springer.com/article/10.1007/s10272-020-0893-1