Those following our research thus far should be familiar with our fondness for using mobility statistics as an input to modelling the spread of COVID-19. So far the approach has helped reliably predict the trajectory of deaths from the virus, and so too case detections. However, recently reported case data from several countries, namely the US, Brazil, Sweden, Philippines, and Indonesia, has seen the trajectory of cases outstrip the expected baseline trajectory indicated by deaths.
There are many plausible theories as to why; (i) a real surge in infections (with under registered deaths), (ii) a significant increase or decrease in speed and volume of testing, (iii) the trajectory baseline computed through death no longer holds, or (iv) changes in the actual mortality rate through the period, e.g. due to improved outcomes from medical treatment.
For the US, this is becoming an extremely costly question. The bipartisan political system has the country divided; with Republicans (and the President) favouring the narrative that real infections are already in decline, as reflected by death statistics, and therefore the economy should be reopened fully to allow for rapid recovery. While the Democrats are practically championing the opposite cause. And it does not look like there is an alignment on the horizon.
We study the data to shed some light.
Figures: new daily case detections and deaths for the US showing a divergence in trend lines since mid May. Source: Worldometers
(A) (Re-)introducing the cross-correlation computation method to assess median times
We had previously used this method to estimate median times from infection to death, and infection to case detection - see here. Briefly, this involves taking the cross correlation (dot product) of mobility changes with the data of case detections or deaths. The time delay between the two represents the median delay between the two datasets, and is varied to determine the fitting point. This correlative 'fit' is the point (i.e. time delay) where this dot product is maximised across the range.
For the latest analysis we make the following changes to the computation vs. previous versions of the same;
A fresher window of data starting from late March to present day (2nd week of June)
Primary focus on five 'sets' of mobilities - Google's retail, transit and workplace mobilities, Apple transit mobility, and a composited mobility across 5 different mobility types (mixture of Google and Apple)
For mobility, the correlation is computed utilising a discrete high-pass filtered dataset of the original mobility data (as opposed to a day on day % change)
For case or death statistics, the correlation is computed utilising day on day % changes of the original dataset or base-10 logarithm of the same (i.e. log(x) instead of x)
The above two considerations are made based on a closer to 1:1 correlative match between the representative datasets (i.e. linearity between mobility, transmission factor, and gradient of the signal log(x))
Figure: sample cross-correlation on death statistics for Mexico. A median time between infection and death of ~14 days is estimated.
Accompanying notes to analysis:
Note 1: the 7 day 'heartbeat' is a form of aliasing noise due to the rhythmic nature of mobility changes occurring with weekly cycles. This is unavoidable and therefore care is needed when reading the results to determine which is the correct fit delay period
Note 2: in some analysis, we use composite mobility, which represents a combination of 3 Google mobilities and 2 Apple mobilities in a ratio of 80/20. The same composite computation is currently used to generate our daily projection reports
Top left: source datasets - (i) mobility (4 types of mobility covered) and (ii) reported death or case statistics
Top middle: crude cross correlation computation across the entire period (e.g. here it is 30 April to 19 June)
Top right: wave form comparison of the two datasets, and cross correlation score, with different time delays applied [ordered from highest score to 4th highest]. This span of data is used to compute the crude cross correlation
Bottom (main heatmap): time windowed cross correlation computation starting at the date noted in the top row [with an 11 day window utilised at present]. Each row represents the different time delay assumed between mobility changes and case/death stats
Bottom right: max and average computations (of cross correlation scores) for two periods - first half/period of data set, and second half/period of data set. The highest average score (avg) is where the best 'fit' can be found i.e. where corresponding delay maximises the computation
(B) Exploring the data
(B-1) Death statistics cross correlation
While there is generally good agreement to our initial rough estimate of median time from infection onset to death of ~14 days, this is not 100% consistent from country to country.
Based on a moving window computation, it is also clear that this figure is susceptible to changes with time. The US for example, shows a constant 14 days when using retail mobility alone, but progression towards 17 days in recent weeks when using composite mobility.
The criticality of ascertaining this time period is that it is a key input variable into computing the trajectory of continued infections and deaths from COVID-19. An elongated time span for infection to death would mean that reported data could under represent the true reality of spread in the population and vice versa. For our general daily projections it is assumed that both these median times are unchanged through the entire projection period.
Figure: sample cross-correlation on death statistics for Brazil. A median time between infection and death of ~14 days is estimated.
Figure: cross-correlation on death statistics for Sweden. A median time between infection and death of ~13-14 days is estimated.
Figure: cross-correlation on death statistics for the US using Google retail mobility data. A median time between infection and death of ~14 days
Figure: cross-correlation on death statistics for the US using composite mobility data. A median time between infection and death of ~14 days previously is now ~17 days
Figure: cross-correlation on death statistics for the US using workplace mobility data. A median time between infection and death of ~17 days is estimated
An earlier period analysis (8 Mar-12 Apr) suggests the median time between infection and death was initially ~15 days (based on crude cross correlation score)
Figure: cross-correlation on death statistics for the Philippines. A median time between infection and death of 16 days up to mid May, but now seems to creeped to ~17 days.
Figure: cross-correlation on death statistics for Egypt. A median time between infection and death of 14 days up to mid May, now seems ~13-14 days
Figure: cross-correlation on death statistics for the UK. A median time between infection and death of ~15 days is estimated.
Figure: cross-correlation on death statistics for Germany. A median time between infection and death of ~14 days is estimated.
(B-2) Case detection statistics cross correlation
US data: 8-9 days detection delay
UK data: 16 days detection delay; but more recently ~15 days
Brazil data: 14-15 days detection delay
Germany data: 0-1 days detection delay; but more recently ~2-6 days
Discussion, for section B-1 and B-2
It is noticeable from the data, that Google's retail and workplace mobility generally yields higher cross-correlation scores (to case and death stats) vs. others - there is relatively more aligned and consistent overlap between peaks and troughs of the mobility data with the rate of change in deaths and cases. It would seem that this mobility type may have a stronger bearing than others in affecting transmission of the virus, however this may vary from country country.
See figure below for a comparison of different mobility types for the UK - higher cross correlation score is found when using retail mobility, and better alignment in peaks and troughs of the waveform, compared to workplace and transit mobilities. RED-day on day % change in daily deaths; BLUE-mobility data (mobilities used: Google retail, transit and workplace; Apple transit; Composite across Google and Apple)
It is interesting how the cross correlation unlocks the time axis as an additional dimension to assess the correlative fit between different mobility types and transmission. Previous methods using linear fit correlation between mobility and transmission over time failed to objectively disassociate the effects of one mobility from another, whereas this method suggests an ability to better discern individual contributions.
Based on the plot lines in figures below, there is however some inconsistency in the estimate of the delay 'fit' from one mobility to the next - see how different mobilities suggest different time periods of delay, i.e. between 14 to 17 days. More work is needed to investigate this further, and fully determine the proportional contributions and correlative factors that different mobility types may have on transmission.
Mobility-transmission cross-correlations for the UK:
As expected cross-correlation scores from case detection stats yields poorer quality results compared to death statistics. This underscores the larger uncertainty in timeliness of case detection data. While deaths occur predictably after a finite period of time, case detection depends on how timely the testing and diagnosis is done, which introduces more variability. If testing is sped up, the median time improves and vice versa.
Generally, there is less variance than expected in the median time computations during the period from March to June. For most countries the variability is between 0 to 2 days at most, i.e. if the median time between infection and death is 14 days in March, then it is likely to remain at 14 days through the period. We would have expected some larger variances in case detection given how long efforts have been going on for, however, while test volumes have increased dramatically test speeds seem to have not (by much). Exceptions are Germany and UK that show noticeable variation over the period.
(B-3) Exploring the effect of different mobilities on virus transmission: Mexico and US case study and discussion
Each of the crude cross-correlation plots below utilise death statistics between 15 April to 22 June 2020, for the case of Mexico - see Figure (A). A range of mobilities have been selected for comparison, these are noted within the charts. For each, mobility has been held constant and the transmission rate computed from death stats (in red) is brought forward in time to represent the time delay between infection and death. Some accompanying notes:
As the lockdown onset started before deaths were high (~mid March period), it is very difficult to find a good transmission fit for this period, as the quality of data is low to aid fitting. As such the data is omitted and transmission data only begins from 15 April.
No one mobility type fully stands out as the most 'optimal' fit - all correlations are below 0.5. At high level, we see the best fits (for the case of Mexico) from workplace and composite mobilities - these yield the highest cross correlation scores of ~0.44 and ~0.45 respectively. Regardless, the fitting from transit mobility and retail mobility are also relatively good at~0.28-0.38.
There are two uncertainties to deal with here; (i) what is the initial estimate for the median time from infection to death? As there is a distinct 7-day rhythm to the fitting data, we need an approach to prioritise one over the other; and (ii) which mobility provides the best estimate of this median time - given there is a variation in the results from one to the next?
The first is easier to handle. We know from previous studies through the lockdown period that the median time is around 14 days. And therefore, it is reasonable to assume that an estimate around or at 14 days up to 3-5 days after is possible - accounting for delays in formal reporting post death. We should thus rule out any fit that falls below ~13 days from the data. The data thus points to fits as follows: Transit - 17 days median time from infection to death at score 0.27, Composite - 17 days at score 0.33, Retail - 15 days at 0.21, and workplace - 17 days at 0.41.
The second factor is not so clear to deal with. Judging by the results of this analysis, workplace mobility seems the best fitting vs. others, at a higher relative score, with estimated median time from infection to death of 17 days. In contrast retail mobility gives 15 days. However, the peaks and troughs of the correlation do not all coincide 100%. Notably (based on the workplace mobility chart below) the transmission peaks in end March, early April and early June do not correspond to the mobility plot line.
To further study this, see Figure (B) below - where we relook at the US. Once again, workplace mobility scores highest at score 0.51 and estimated median time from infection to death of 17 days. Meanwhile, retail mobility scores 0.31 at 14 days.
It is not clear as to why there is a variance in the median time estimate of 2 days in the case of Mexico, and 3 days in the case of the US. We can only assume that the mobility data supplied is accurate and contains no time lag in reporting, thus the variations are a true representation and therefore contain no errors.
Figures (A): Mexico data (death transmission vs mobility)
Figures: US (death transmission vs mobility) - highest correlation scores only. Death data window 15 March to 23 June.
(C) A deeper look at the case of the United States
Let's refer back to the four hypotheses we laid out in the introductory section, and study each one in more detail.
The potential causes of divergence between the growth of cases and deaths could be due to: (i) a real surge in infections (with under registered deaths), (ii-a) a significant increase or decrease in speed and (ii-b) volume of testing, (iii) the trajectory baseline computed through death statistics is possibly in error, e.g. due to changes in median time for infection to death, or (iv) changes in the mortality rate throughout the period of study, e.g. due to improved outcomes from medical treatment.
(i) - not a significant factor. This is related to the issue of excess deaths, which represents deaths not formally detected as caused by COVID-19. While we cannot definitely say this is an irrelevant factor, our view is that its contribution should be systematic and therefore occur consistently throughout the entire period. It is not logical to have excess deaths suddenly account for a much larger proportion of deaths in recent weeks compared to the March through April period. See chart below: excess death (minus) reported COVID-19 deaths for US states where difference is >100. Note the top two states are New York and New Jersey - where the virus hit hard. There is a distinct plateauing of the curves with time, proving that unaccounted deaths persisting (and rising) through the May-June period could not be the cause of divergence between case growth and deaths.
**Note: we have now included excess death statistics for several countries in our daily projection report (29 June) - these too indicate that excess death errors gradually reduce to ~zero with time.
(ii-a) - not a factor. As we have seen the change in speed of testing throughout the period of study is relatively small if at all present.
(ii-b) - cannot be ruled out definitively through this analysis. Rather concerning is that there is a high percentage of asymptomatic / undetected infected persons in each country (e.g. at a true mortality rate of 1.2% we estimate that the actual total number of infected persons is 13x the number of total cases detected in the UK, and 5.5x respectively for the US and Italy). Increased testing volumes per day will inevitably lead to detecting these asymptomatic carriers, which we suspect has been the case. While not a bad thing, as it is beneficial to have those persons isolated to reduce further transmission, it does skew the threshold of what constitutes a 'detectable case' compared to previously deployed test regimes.
(iii) - not a significant factor. As we have seen, timing of deaths (from the initial point of infection) is a very stable period when observed throughout the study period. The volume of deaths, outside of excess deaths, is a formal record and therefore unlikely to contain errors beyond day to day variations i.e. tolerable delays in a proportion of the daily reported figures.
(iv) - uncertain contribution, and cannot be ruled out. Biologically speaking, it is almost certain that the virus has not reduced its potency with time, given minimal to no detected mutation during the course of its existence. However, different countries have administered different treatments and applied different care procedures to COVID-19 patients, which may have had an impact on mortality rates - although any definitive literature on matter is scarce at this point.
So this leaves us with two possible major driving factors: increased testing volumes (and the incidental detection of asymptomatics), and changed mortality rate due to improved efficacy of medical treatment.
Both these factors do raise a further, and rather critical question: Is President Trump right in pushing for a more aggressive reopening of the US economy? As some may say, even a broken clock shows the correct time twice a day! Perhaps in this particular case, the president could indeed be more right than wrong.
It would seem either of these factors would suggest that the worst is indeed over for the country, and while the population needs to stay observant to measures such as social distancing, mask wearing and personal hygiene, the economy could indeed be reopened at full throttle without massive risk of further outbreaks. Reported death data from the US over the past week suggest a transmission rate within the range of 0.96-0.98, while not great, it is within non epidemic levels. It is perhaps this figure that decision makers should be focused on.
Alas, the two sides of the Republican-Democrat divide may continue to agree to disagree on this point, and continue debate this matter to no end. We duly await the verdict.