C-19: A folly of trusting data quantity over quality [updated]?

How we may have unknowingly downplayed the impact of the virus through over-reliance on testing, while death stats told us everything we needed to know

Part 1 updated April 2020 | Part 2 updated November 2020 | Part 3 April 2021

Note: this article accompanies our daily COVID-19 projection report.

PART I / Apr 2020


  1. The issues with using case detection data as primary indicator for how countries are doing vis-a-vis infection growth of COVID-19 are multiple, not least clear insufficiency and delays in testing; due in part to slow testing given major constraints in availability and cost of test resources. To elaborate:

  2. Case detection is delayed. A study in Hubei established a median testing delay of 10-12 days from initial infection [The Epidemiological Characteristics of an Outbreak of 2019, CDC China]

  3. Test kits are limited and often, by the time they are available numbers have increased exponentially leaving health authorities to play a desperate game of catch up

  4. The result is a misguided view for decision making if case infection figures alone are used

  5. Reported actual death data is assumedly more robust (definitive and timely information) vs. case detection figures and therefore make for a justified choice to computationally infer infection volume and growth

  6. Objectives of this analysis are therefore to:

  7. Utilise death data as an indicator of case infections and growth, via an empirically derived and verified mathematical model

  8. Determine how well current testing efforts have been performing in a given country, in terms of; timeliness (measure on time as and when infections happen) and sufficiency (measure every infected person as opposed to only a subset)

  9. Develop forecast projections of COVID-19 spread and death for countries under study, assuming mitigating actions are taken, as an input to decision making on future pathways

  10. The forecast modelling is further complemented by recorded mobility data as a real-world indicator of the effects of social distancing and lock-downs. The data from Apple and Google’s COVID-19 Mobility Reports is able to verify the model fit, as well help refine input assumptions into our projections.

  11. We have found the models to provide reliable projections for each country up to 2-4 weeks into the future, in addition to enabling us to conduct forecast modeling of potential future interventions to inhibit COVID-19 spread

Discussion & conclusions

  1. European countries are generally very behind in testing. In part due to delayed action in implementing social distancing and lock-downs, allowing infections to grow and outpace test resources (insufficient testing). Also priority emphasis has been placed on testing of the sick, leading to detection at a matured stage of disease development (late testing). Deaths have now started to grow unhinged in multiple countries, indicating that infections have already surged - making testing/detection even harder to keep up. Worsened by the fact that all infected persons will eventually need to be tested as long as they remain infectious if further spread is to be prevented. Deaths will continue where mitigation steps fail, the pipeline is filled with infected persons and it is a matter of time for more deaths to come

  2. Asian countries, across the South and Southeast, are generally faring better given earlier stage actions to execute some form of social distancing and lock-down. As such, testing is more on track and potentially able to keep up with growth in infections, as they grow at slower pace and from a lower base. Some countries stand out as being best in class in testing, and with that controlling infection growth - Guangdong and all Chinese provinces ex. Hubei in fact, South Korea, and Singapore are a few examples

  3. In general, blunt measures such as social distancing and movement control i.e. ‘lock-downs’ seem to have been the only effective measure to taper infection and death. This is a worrying sign of things to come, as economies sit at a standstill waiting for the virus to pass or a vaccine to be introduced. It is expected that two things; (1) faster and more comprehensive/sufficient testing, and (2) technology for tracking and tracing spread of the virus, need to be used effectively in tandem if any country hopes to gain back normalcy soon, after the passing of these earlier waves. We sincerely hope that countries that are lucky enough to have mitigated a massive death crisis through early lockdown measures are at least actively thinking about this as a step forward

  4. Looking back at what has transpired over the past weeks, it is highly questionable as to whether testing (and track + trace) as the primary strategy to overcome the COVID-19 spread was ever going to work. And it is reasonable to have expected that the general guidance from global health authorities (e.g. WHO) should have been for countries to have led with a lock-down and social distancing strategy before deploying testing in full force. On balance, this would arguably lead to a more palatable outcome over the long run;

  5. COVID-19 spreads at tremendous speed. An R-naught >2, asymptomatic and pre-symptomatic transmission, long incubation period of up to 14 days, and infection concentrating at the upper respiratory tract. The virus SARS-CoV-2 itself is hardy enough to last up to 3 days on solid surfaces like metal and plastic, while easily aerosolisable with individual particle size within the order of ~100nm. All these make COVID-19 extremely transmissible

  6. Mismatch of exponential growth of COVID-19 infection vs. expected measurement lag further compounds the issue. It is well documented how even in Hubei, where the virus first broke out, the median time between infection and detection via testing was an est. 10-12 days. Likewise, all countries studied show similar results as established in our analysis. With an infection doubling time of 2-4 days in most countries, that means testing taken today would already be underestimated by a factor of 22 (4x), 23 (8x) or 24 (16x) by the test results are tallied up. The result is thus a highly predictable failure to catch up

  7. Constraints in test kit availability as infections surge through the population make matters worse. Not all countries had created assays by the time testing was to start, and the required PCR test kits are well-known to be in short supply. To make matters worse testing creates queues for patients to be swabbed and processed, further exacerbating the time lag. South Korea for example has, to date, administered some 461,000+ tests at a confirmation yield of ~2.2% (as at 5th April 2020); a highly ‘inefficient’ and time consuming exercise against a backdrop of a fast growing and potent virus spread. And this is a country that manufactures its own PCR test kits and had, comparably speaking, ample supply at the ready as the pandemic took off. The authorities had also co-opted telco companies in the country early on to deploy mobile technology to help with surveillance, monitoring and contact tracing. The reality for the rest of us perhaps is that South Korea should be considered the exception rather than the rule in its ability to resist a total lock-down; emerging economies in the world may not be so well resourced to enjoy the same privilege

  8. As a result, and based on our analysis, reliance on testing data alone often leads to a vast underestimate of the actual total infections in each country, often between 5-20 times. While not factually incorrect, as these figures represent the actual detected number of cases, the figure understates the magnitude of infection spreads, and may lead to the general public perceiving it to be a smaller problem than it actually is. We believe it would be better if estimates are made as to the actual figure, and are communicated transparently to the public from time to time as it may allow for greater vigilance by each person to play their individual part in reducing the spread of infections

  9. While it is unclear what the eventual and definitive median estimates for the duration of COVID-19 incubation, infectiousness, recovery and death are, it is plausible that the time from infection to death is within a similar timeframe of the testing delay (i.e. the delay in verifying an infected person after initial exposure), given testing delays and the fact that most countries prioritise testing of the sick. As such, testing for infections may not yield an advantage over analysing death statistics; a perverse yet meaningful insight that should be an invaluable takeaway for those in decision making positions

  10. This is by no means a criticism against testing itself, or a proposal to do away with it. Testing remains of great use for authorities to identify cluster networks where infections are growing so that interventions can be appropriately targeted and deployed, and for epidemiologists and scientists to study generational evolution and mutations of the virus strain. Rather, we simply believe it would have been appropriate for most countries to have enforced a social distancing and lockdown earlier, to buy time for testing to be rolled out at a manageable pace. And eventual draconian monitoring and control measures perhaps to come in time; notwithstanding challenges arising from societal non-acceptance. The three strategies can then, in time, be synchronised with one another at a manageable pace in line with resources available

  11. Lastly, and as a core argument of this write up. Whereas analysis of death data yields strong predictability of future deaths and a fairly robust inference of infection growth; case detection data meanwhile yields practically little of consequential value in the same regard. Often riddled with insufficiency and time lags, resulting in poorly fit trend lines; case detection data has also been observed to be strongly correlated to testing volumes, i.e. the more persons tested within a given day, the higher the case detection volume. A perverse and illogical takeaway that leads to bad decision making if data is taken at face value: the statement ‘garbage in, garbage out’ comes to mind. Those making key decisions should at least be made aware of the shortcomings in data collection gaps and lags, and similarly, the intricacies of modelling exponents where time-sensitivities are involved. This is not a matter where mistakes can be afforded

  12. It should be noted also that reported death statistics are not however without their own shortcomings. Several countries have cited issues in reporting deaths occurring outside of the healthcare system, i.e. within communities such as care homes where noted causes of recorded death are not fully robust; namely the UK, France and Indonesia. However the results of curve fitting with current death data sets (daily and cumulative) are promising (see subsequent pages in this report) and yield good forecasting ability; suggesting that reporting errors are systematic or relatively low in volume compared to official records. Regardless, the data gathered to date across a range of countries suggests it is far more reliable to use for forecasting purposes compared with case detection data, as explained in the preceding paragraphs

Notes on analysis

  1. All analysis uses mortality ~1.2% (long term - based on Chinese provincial data), except Guangdong (known figure ~0.7%), Hubei (where detection arguably lagged leading to higher mortality of 4-5%, Korea (assumed 1%), and Singapore (assumed 0.8%)

  2. Three generalised models; a symmetrical bell distribution, Weibull distribution, and numerical distribution based on a non-continuous transmission factor were each assessed to seek an agreeable fit with recorded data. While the first two distributions provided reasonably good fit at the rising edge of growth exponent, the latter was eventually chosen based on its ability to best represent specific interventions such as social distancing and lock-downs; and ultimately provided the best fit to data available. For modeling purposes, infection growth takes the form of three phases:

  3. a rising exponent formulated as a geometric progression utilising initial death growth to represent the transmission factor (and R-naught),

  4. a gradual decline in this exponent to a factor <1 representing a process of locking down and social distancing,

  5. a constant long term factor of <1 representing the latent eventual transmission in each country as it seeks a long term strategy to manage the virus spread. As an initial assumption, Hubei data is used (~0.91), but this will be unique to each country as time passes.

  6. A cohort consideration is made for each incoming wave of new daily infected cases, resulting in the next days as one of 3 outcomes: continuing to be infected, recovery or death. A median estimate for time from detection to death of 8 days is used across all country models

  7. All countries assume lockdown/partial lockdown within months of March and April 2020 (ex. China); this is illustrated by a visible decline in growth of the daily infected bell curve. This may not represent all cases but has been the path taken in nearly every country, and as the non-action path naturally leads to unthinkable infections and death, modelled by a simple exponent. The specific assumptions on timing and effectiveness of these lockdown actions however need refinement based on actual results in each country, and as such we will be continuously updating our input assumptions. Given the initial emphasis of this analysis on testing effectiveness, future forecast may not be perfectly modelled, and should be read as indicative only

  8. Added data from Apple and Google’s COVID-19 Mobility Reports (which we started from 17 April onwards) provides a direct comparison of the model transmission factor and is used to further refine assumptions. This data is available T-2 vs date of our reports hence practically gives an almost ‘present day’ view of physical movement by the population (i.e. mobility)

  9. While this approach is a departure from conventional SIR epidemiological mathematical modelling, the approximation is shown to be more than adequate for the purposes of computation where total infected << total population; with the added elegance of not having to solve several ODEs in the process

  10. COVID-19 statistics used are compiled from Johns Hopkins University. Link: https://github.com/CSSEGISandData/COVID,19/tree/master/csse_covid_19_data

PART II / Nov 2020

Introduction and outline

As the COVID-19 pandemic rages on without natural cessation, it is increasingly evident that we will not be able to overcome its spread solely by current means. And, that it is even more crucial for a vaccine to be introduced soon to avert further mass spread and deaths from the disease.

Blunt social distancing and lockdown measures, while alarmingly Orwellian sounding when first mooted, have now become the staple to administer as and when cases rise sharply. Meanwhile, and in our opinion, there is still inadequate scientific discussion on the virus’ epidemiology. While the concept of transmission rate has at least been made mainstream, there is little to no acknowledgement of key factors such as median time for case detection, the efficacy of treatment regimes, transmission correlation to mobility, or some understanding of key and dominant transmission modes.

The contradiction is that we seem to be attempting to solve what is a highly scientific problem, that of the epidemiology of an aggressive virus, without the actual use of deep science. In this article we update our latest understanding of the virus’ spread through an update of our models to reflect present day needs, and discuss some suggestions as to what we can be doing better.

Updated discussion

Updates to our projection modelling generally arise from the following:

(1) Apple mobility was underestimating movement in early parts of lockdown

Our composite mobility computation by default includes Apple walking and driving mobility (in addition to Google's data). In most countries, Apple's mobilities have exceeded the 100% level vs. baseline, which in our view does not make sense.

(2) Different countries have different mobility-transmission characteristics

It is now more evident which mobilities may have a stronger bearing on changes to transmission of COVID-19, accounting for differences in societal interaction from country to country. Whereas during the first lockdown phase all mobilities showed a coinciding downward 'cliff' shape, since then changes in mobilities have been less synchronised with one another. By taking this approach, we provide for a closer scrutiny of which mobility is most strongly correlated with transmission changes, which evidently is observed to be different from one country to the next.

(3) An under-estimate of the hysteresis % due to (1) and (2)

Due to the (ill-)effects of Apple mobility contribution to the assumed composite mobility driving a change in transmission, it turns out that most of the hysteresis computations, in particular those of European countries, was vastly underestimated. For example, while it was assumed a hysteresis of up to 30% was exhibited by the UK in our earlier computation by utilising composite mobility (equal weightage across 5 mobility types), by using Google's retail mobility alone this hysteresis is calculated to be <2%. Thus, as mobility has risen across the board, the country is now back in a state of elevated case growth - some 20,000 new cases per day. This has tremendous implication - effectively, the months of lockdowns have evidently not resulted in more conscientious and cautious conduct in the population as previously thought.

(4) Case detections and deaths further de-link from one another...

In our first phase of analysis we had held the ratio of cases to deaths as a constant - indicating a constant mortality rate from COVID-19, a reasonable assumption at the time. However, today it is evident that the ratio of cases to deaths has dropped significantly with time, for almost all countries under study - though interesting the change is not equivalent from one country to the next. While it was earlier thought that this could have been caused by more comprehensive and faster testing yielding an increased case detection (including more asymptomatic and pre-symptomatic cases), it is more likely that treatments rolled out since the start of the pandemic are beginning to show significant effect in prolonging and reducing deaths. Drugs such as Remdesevir, Favipavir, Dexamethasone have shown reasonably good efficacy when administered to hospitalised patients - resulting in reduced recovery time, and reduced mortality rate when measured at aggregate level.

(5) ...leading to an overhaul in our projection methodology

While we previously aimed to compute the regression of mobility towards death statistics as the primary determinant of our projection modelling, the latest approach aims towards the regression between case statistics and mobility changes. This denotes a change in emphasis of the modelling away from computation of deaths and adequacy of healthcare capacity, towards modelling future infection and case growth as a means to aid future planning around COVID-19.

Not only has this shown to yield reasonably good projection estimates for case growth across a wide range of countries, but it is likely to be more useful for decision makers at this mid/late stage of the virus' evolution - providing a basis for future mitigation planning around the virus' spread.

*Additionally, from January 2021 we have adjusted all forward death projections based on actual case fatality rate exhibited by each country. This gives a reasonably good trajectory forecast of deaths, at an actual mortality rate (vs. previously estimated at ~1.2%).

Summary of findings

The following notes accompany the latest country projection reports we have started producing since October 2020 as a ‘Part II’ update to our COVID-19 country projections since April 2020.

  1. Second and third infection waves are notably seen across the board - driven by recent changes in relevant mobility factors which have risen gradually as economies reopen after the initial lockdown phases.

  2. Hysteresis, denoting how much more cautious the population is in preventing spread of COVID-19, has not changed as dramatically as previously thought. In fact most countries hover around the +/-0% mark - i.e. only slightly less likely to cause transmission, or in fact slightly more likely to cause transmission at present day compared to during the early phase of the virus’ spread.

  3. Given these observations, it is obvious that many countries will continue to toggle into and out of infection waves as mobility levels rise and fall with distancing and lockdown activities.

  4. Remarkably it is in populous countries such as India, Indonesia, Philippines, Mexico, Ecuador and Brazil, where curve flattening of the first wave was much slower than others, that have emerged in an arguably better position of holding transmission below 1.0 level with mobility at raised levels of above ~70%. By and large, these countries held mobility at low levels and only gradually increased it with time, e.g. Brazil took a total of 8 months (since March) to increase mobility from ~30% to 90% today, while for Indonesia it was ~50% to ~80%.

PART III / April 2021

Context & Introduction

It is evident from our analysis how well correlated mobility (particularly, as measured by Google’s Community Mobility reports*) is to the transmission of COVID-19. There is good and bad news. On the one hand we have ascertained an ‘adequately robust’ method to project forward cases and deaths under different population movement scenarios. On the other, it is now obvious that the balance between overcoming the virus and protecting livelihoods will be very challenging.

Several, perhaps surprising, corollaries also apply given the above.

  • It is questionable how well faster testing has contributed to more effective mitigation of virus spread. Germany, Japan and South Korea, while consistently more able to detect cases faster, appear to be at cursory glance as defenceless as Brazil and Indonesia. For the reader's reference, the difference in detection speed is 5-10x apart.

  • Also, as described before, it is not clear how effective improved social distancing and mask wearing have been at curtailing spread. In theory they should help, but quantifying their effects is approximate at best. It is the congregation of people within a location that increases spread.

  • A more subtle point is the possible lack of strong geographic discrimination in how most nations have administered control orders. Our computation utilises aggregated country level data to simplify analysis, yet the correlation of transmission to mobility still holds true. In other words mobility across large swathes of the population seem relatively well correlated, despite different extents of spread from one geographic region to the next.

  • A special mention needs to also be made about mutated variant strains of COVID-19 that are now prevalent in some countries. It is telling that these more aggressive variant strains have arisen from highly affected countries like the UK, Brazil, India and South Africa - spelling out a huge risk that the world collectively shares if the virus is not brought under control everywhere. Adding in the uncertainty in vaccine efficacies against these strains means a potential vicious cycle of unending virus propagation for a long time to come.

  • This analytical method of linking transmission to mobility also allows us to observe deviations from baseline transmission to detect and determine onset of new strain prevalences, without having to only depend on frequent serological testing.

*Note: Google community mobility measures approximately the rise and fall of weighted presence in a given location, defined by the activity at that location, for example retail and recreation. Contrast this to Apple’s method that measures the volume of journeys to a location, for example Apple map requests for driving between two points on a map.

Brief takeaways from updated analysis

(1) Testing delays are broadly increasing across the board

(2) Mean time to death is lengthening, most likely due to treatment effectiveness

(3) New variant strains are evident, resulting in deviation of transmission from baseline

(4) Vaccinations have yet to show impact in reducing mortality rates, in spite of high rates

(5) Unique country characteristics mean control measures should be tailored to situation(s)