C-19: A humble plea to the scientific community to keep the forecasting science relevant

Updated: Jun 10, 2020

There is now a critical need for a step-shift improvement in data visualisation tools and modelling science around the COVID-19 pandemic. Earlier iterations were built around specific purposes; informing government policy decisions around healthcare capacity planning and timing of lockdowns, and keeping the general public informed on latest developments of the virus' spread. However, for the most part, these tools may have now passed their point of usefulness and new tools are needed.

What we think will be crucially needed in this upcoming phase, where lockdowns are being eased and economies slowly rebuilt, is a close to real time barometer of where each country stands and where they are headed vis-a-vis continued COVID-19 spread. Tools that can help gauge pros and cons of different pathway options each can take. Ultimately, to help governments and civil society navigate the trinity of challenges: social, economic and healthcare, that are bound to emerge.

With that, we strongly believe the tools from the previous phase need to be reimagined and revised to stay relevant to the job at hand.

Structure of this article:

We break down this article into a few sections. We start with a brief stocktake of data visualisation and forecasting tools available up to present day (section A), and then move on to a discussion about key drivers for a robust modelling technique of the virus' spread in section B. Lastly we hope to round things off by outlining a holistic modelling method that could be a good starting point for all models going forward in section C.

A. A quick history of data and forecasting tools since the start of the pandemic

A-1. Data visualisation tools / dashboards

Several frequently visited online dashboards and data visualisation tools come to mind. Websites such as those from Worldometers, the JHU COVID-19 tracker, Ourworldindata, Tableau's Global Data Tracker and nCoV2019.live are amongst the most popular. By and large, most offer the same information - a global summary of daily cases, recoveries and deaths. However some do offer additional visualisation and insight generation tools, and others have complemented with additional datasets from related areas such as available hospital capacity, COVID-19 testing statistics, demographic statistics and the like;

  • Up to date data with wide ranging reach across countries and dimensions: Worldometers and JHU - the two most up-to-date information datasets on COVID-19 cases and deaths for more than 185 countries

  • Easy to use data manipulation tools: Ourworldindata - data manipulation and benchmarking across a user-selectable subset of countries. Offers interesting dimensional filters that help cut through noise when comparing multiple countries at once.

  • Additional datasets that complement basic statistics: Oxford's Government Stringency Index tracker - defines a scoring rubric for stringency of government actions to combat COVID-19 spread, and scores and monitors continued progress.

Why these exist: provides up to date information of the virus' progress across a wide range of countries. This then helps inform the public of latest developments, as a complement to reported news of the spread.

A-2. Forecast projections for deaths and cases

This linked page hosted by the CDC provides a good repository for US forecast resources (link), while the following page from the CoronavirusTechHandbook lists down a long list of forecast resources collated from all over the world (link). By and large, the two most well known forecasts are by the US agency, IHME, whose projections were critical to kicking off the US' lockdown campaign, and by Imperial College London, that was critical to kicking off the same for the UK.

Our (unrefined) opinion is that forecasting methods seem to vary only marginally from one to the next. More objectively, we make the following observations, and offer some constructive feedback that may assist the next phase of developments;

  • Primary methodology - i.e. the core and principal method of forecasting. By and large most models have followed a curve-fitting approach. What this means in crude terms: finding a mathematical representation of the trajectory of the infection (or death) curves and fitting it accordingly to real world data. This necessarily assumes a momentum trajectory is held in place up to the last point of analysis, and this trajectory is necessarily unchanged into the future regardless of what happens. In fact, the models by MIT, Northeastern, Georgia Tech, US Army, UT Austin and UCLA all assume existing interventions (at the late stage of lockdown) do not at all change into the future.

  • Secondary methodology - additional modifiers that reside atop this earlier core; can be thought as the how as opposed to the what. While many have attempted fitting curves with analytical methods via continuous functions (e.g. solving the SEIR ODEs, or utilising known formulaic distributions such as Gompertz or Weibull); others have attempted numerical methods that further utilise machine learning algorithms and stochastic/probabilistic methods to determine model fits to real world data.

  • The critical issues posed by extending the usefulness of these models beyond their initial objective is that each assumes that the virus behaves independent of extrinsic inputs, i.e. its position at any particular time, in terms of infections or deaths, is deterministic to its evolutionary path moving forward. This is often qualified in supplementary notes as 'modelling representative of only a single wave of outbreak', or 'model assumes current interventions will not change during the forecast period'. In our opinion, neither likely satisfies the needs of the modelling going forward.

  • Secondary issues with this approach are: (1) inability to model changes to the external environment, e.g. changes in lockdown or other intervention policies, without resetting the model fitting to the last point of available data, (2) an inability to forecast into the future under different scenarios where interventions may change for better or worse, (3) Impossibility to predict or model the resurgence of second or third waves of outbreak.

  • These are not unexpected downsides. The models sufficed for what they were intended for; to repeat again here: planning healthcare capacity in and around surging infections and death waves.

Why these exist: aid planning for adequacy of healthcare capacity such as hospital beds, critical care units, and ventilators; often across a wide geography i.e. nationwide. Also as an aid to government policy development around lockdowns, driven by consideration of the primary objective.

B. Crucially, the objective of the models have now changed. We explore what could be the 'correct' model mechanics for a next phase of forecasting. To recap and further define what the new objective could be; we need forecasting capability that can robustly model real world input and intervention options, in order to help contrast their merits and inform alternative pathways for a credible path to recovery. Ultimately, the tension that resides in saving lives through crude measures like lockdowns needs to be balanced with the protection and revival of the economic livelihood of the many people who constitute the economy.

Having observed the steady evolution of data visualisation tools and modelling science existent around the COVID-19 pandemic, we can't help but to feel slightly disheartened by how underwhelming the rigour of some of the analysis has been. Not only have forecasts been inaccurate, time and time again, which caused quite a stir in the US (see these articles - link, link, link); but the continued insistence on curve-fitting and 'momentum trajectory' methods for ascertaining outlook pathways is disappointing to see. Especially when such methods are continued beyond the first 1-2 months of the outbreak. This seems a far too simplistic representation of what is actually going on.

There is also a wide abundance of model options to choose from, adding more entropy to an already confusing topic (given its novelty), and risk obfuscating what matters and what may not. This, we fear, risks putting decision makers in a state of paralysis, not knowing which forecast to trust and which not to. To date, there are already some 21 known forecast simulations for the US alone collated by the CDC, each from a reputable research or academic institution. And no, they do not all singularly agree on an outlook for just the one country.

By now, the underlying drivers of viral transmission, such as human to human interaction, physical mobility, as well as some measure of a given population's practices and conduct, should have been mapped and studied in depth to determine their correlative link and temporal evolution to the virus' spread. Without a stronger understanding of how such inputs translate to measurable impact, we fear there is little to no hope to be able to map out and robustly predict future instances of new outbreaks and second or even third waves.

As a starting point, we suggest a few features that a robust revised model could consist of;

  • (1) Help provide a better understanding of the nuances around the temporal nature of COVID-19 data. It is easy to get quite lost when comparing COVID-19 cases and deaths between countries. Inconsistencies arise in ratios such as case fatality rates, as well curve trajectories. What many may not realise is that there is a wide ranging time lag between real infections in the population and reported cases, and also a (more predictable) time lag between infections and deaths. These two periods may be inconsistent between countries, due to differences in testing regimes, norms and administrative reporting procedures. This makes it tricky to compare case reports between countries without an understanding of the duration of this lag/delay.

  • To break things down; there are three distinct time stamps to consider: when a real infection takes place, when it is measured and noted as a case through testing, and lastly when it translates to recovery or death. As reference we have measured median times for infections to death at 14 days, and infection to case detection between 2-20 days depending on country. The figure below may help visualise these different time periods.

  • There are many more issues stemming from this, but the critical one is perhaps: cases reported in a given day are often delayed by some 10-14 days due to various factors such as late diagnosis, testing queues and so on. And therefore policy decisions stemming from these reports risk being based on a lagging view of the reality.

  • (2) More detailed insights around testing progress in each country. While some dashboards have started reporting several metrics around testing, such as volume of testing conducted, positive test rates, and volume of tests per capita, there has been no reporting and critical assessment of testing speed - what we term the median case detection delay, i.e. the time period between infection or contraction of the disease and its detection via testing. In theory this is a crucial metric to track as it indicates how much time passes during which the infected person could have spread the virus to other people.

  • Our analysis estimates that testing delay can range widely, from 1-2 days (Austria and Germany), to up to 22 days (Bangladesh). To add, this duration can vary with time as test regimes and procedures are improved or deteriorate with time - a crucial factor to understand well if a country is to harbour any hopes to overcome the speed of infection spread instead of constantly trying to catch up.

  • (3) A clearer logic of cause and effect in mapping out measurable drivers for transmission of the virus. For mapping out the drivers, one can follow a simple logic: Transmission of the virus happens when there is people to people interaction between an infected and infectious person with a healthy and susceptible one. A distinction must be made between infected and infectious, due to the nature of the incubation period and the subtleties of viral shedding patterns with disease maturity - in other words, a person could be infected and not yet or no longer infectious.

  • Person to person interaction has to be proxied somehow by a credible, objective and timely data source. Enter telco data (e.g. Singapore and the US have explored this), or to further simplify the workings and computation - mobility data, that is now publicly available from both Google and Apple. They each provide this data in a relatively timely manner, for a range of mobility types e.g. retail, grocery, workplace, home, walking, transit and driving activities.

  • However, not all mobilities may contribute equally to transmission and have to be adequately cross-correlated to determine sensitivities to transmission as well as potential interrelation (cross correlation) with one another.

  • (4) An understanding of how these transmission drivers, and the correlative linkage between one another and measured outputs change with time: Some factors and drivers may be static, while some may change with time. It is conceivable that reducing transmission through a reduction in interaction between people may be a linear and unchanging relationship. However, the reverse may not be true - increasing interactions between two persons after an initial period of outbreak, followed by lockdown, may reduce the likelihood of transmission due a change in peoples' conduct and practices. Lets say that 10 infected people in a given day would be expected to infect 50 others. After a phase of lockdowns, which is followed by a change in conduct towards mask wearing and social distancing, the same number of 10 people might only infect an additional 10.

  • Many other drivers could change too. Such as the cross-correlation between different mobilities with one another. Or median reporting times for infection to detection, or infection to death. It is not trivial to map all these out, but neither is it impossible to do.

C. We have developed a starting point for a more robust modelling technique to forecast COVID-19 spread.

The analysis embedded in our daily projection reports encapsulate our attempts at developing a more robust modelling technique, incorporating the input outlined in this article. While far from perfect, it has already been able to forecast the timing and magnitude of upcoming outbreak waves for a number of countries. It has also been helpful to highlight where there may be potential outbreak concerns looming in the future for countries that have started reopening their economies.

We do hope this work helps kickstart an initiative to reimagine and retool current models to be ready for this phase of COVID-19 developments. Much more can and should be done to further improve on this starting point, to which we humbly request that the scientific community rise to the challenge and help equip decision makers with the necessary tooling to succeed in navigating countries through the journey.

In closing, we will not pretend to have the final and definitive word on the matter of forecast modelling for COVID-19. We merely felt that it was important to voice our thoughts and help establish a baseline of what robust modelling in this second phase of COVID-19 spread could look like. Our plea goes out to anyone or any institution researching or has a keen interest in this topic, including our own selves. We welcome interested parties to join in a discussion with us or other stakeholders to debate this important topic for the benefit of everyone. It feels desperately needed.


Recent Posts

See All