Category: Data Analysis

Covid19 Epidemic. Doubling times responding to lock-downs

3/25/2020

In a recent blog post about the Covid 19 epidemic I have tried to tackle the problem of variability in the reporting of SARS-CoV-2 confirmed cases by estimating the case doubling times for various countries. A few things of note from that analysis included:

a. Very different doubling times between countries that suggested that some have effectively managed to slow down the spread of the virus. China in that respect had already succeeded in almost putting the epidemic to a halt and so was not included in the analysis.
b. Even though the estimated doubling times were rather stable they showed an interesting dynamics which called for a more careful analysis. This is true for almost everything that has to do with this epidemic, which is the first of this scale to be monitored in real time.
c. There was no correlation between number of cases and doubling times (Italy being the most problematic case with high number of infected people and low doubling time). There was also no correlation between the doubling times and the days since the first case was reported in each country. This means that even in countries that the "spill" occurred more or less at the same time, the situation has not been developing in the same way.

All of the above imply that a more careful examination and the factoring in of other aspects is required. In this post I have tried to dissect the different modes of the developing situation in countries with more than 300 cases (as of March 24th) excluding China and South Korea that appear to have contained the situation (and hopefully will remain thus).

In the plot below I have created the same graph as in the previous post with an update on the data four days after. A number of things regarding the dynamics can be seen by comparing this plot to this one for March 21st. Italy is showing signs of slowing down the spread moving up the y-axis (Doubling Time, DT). France and Germany have done this quicker but Spain has not done much (with a constant DT of <4). The US is the most worrying case (as many have already noted). With a big population and a doubling time of < 3 it is a question of days before they become the center of the outbreak.

One main difference of this plot with the previous one is the colouring of the counties, which now represents the time since the first reported case (in weeks). All countries with more than 10k cases belong to the category of "early outbreaks" and are thus coloured light blue. But not all "early outbreak" countries have many cases and this is largely due to their high doubling times (Japan and Singapore the prime examples). In the plot below, I have tried to group countries based on the combination of doubling times and weeks since first case, by drawing regression lines that represent a (sort of) Slow-Down Rate (SDR). The higher the slope the most effective the slowing down of the spread is. You may think of the SDR as the fight to increase the doubling time while dealing with an increasing number of cases. For most epidemics the natural cause of things is that doubling times increase (and the spread slows down) with the accumulation of cases, but unfortunately this only happens after a large percentage of the population has been infected. As we try to flatten the curve we cannot afford this and so we hope that this slowing-down will happen with case isolation through quarantine. It remains to be seen if points on this plot show a trend of moving fastly towards the bottom right (bad scenario) or slowly towards the top right (good scenario).

Through four SDR regression lines, I have split the plot in four areas, each of which contains at least one country where the outbreak was detected early. this means that different SDRs cannot be attributed to insufficient time for observations. In this way, we can then correlate the development of the situation in each country-group with other characteristics and one such is the mitigation measures that have been imposed, as we will see right after.

Looking into the plot below (and note that this is a rough, eye-balling split) we can see four different groups. From best to worst, Japan and Singapore have SDR > 12 as do Quatar, Slovenia and other countries which are -nonetheless- late in reporting a case. Their goal is to stay in that area of the plot. A large group with SDR between 6.5 and 4.5 includes the Scandinavian countries (except Denmark which was late in reporting a case and is doing rather well) and a number of other "late outbreak" countries. Some struggle to stay in this area (Belgium, Israel, the Netherlands and Australia) while others (like Greece and Iceland) may be more optimistic. Unfortunately the countries that represent more than 90% of the active cases worldwide lie in SDR areas that are below 4.5. Italy is fighting to get into the "OK" zone of SDR=4.5-6.5 but Germany, France and Spain (in particular) are far from that. The US is even further down. The spread shows no sign of slowing down. Quite the opposite.

The question then is: What is making Japan and Singapore so successful and what will help Belgium and the Netherlands avoid the fate of Italy? One answer that has been suggested by the short-term history of this developing situation is: extreme self-isolation measures with effective lock-downs. And this is in fact the approach that most of the countries have been adopting. The best example for Its efficiency as a means of slowing down the spread (even to a halt) has come from China that has now effectively contained the virus. Below we see the steady increase of doubling time in China from January 24th (the first date for which we have data) to February 8th. There is a marked increase in doubling time that more or less starts before the decision for a full lock-down in Hubei province (the epicenter of the outbreak). While the slowing down was already under way before the measures were put into place there is a notable increase of the curve, from February 3rd, which comes a week after the lock down.

To say we have learned from China is easy but to actually do it like them is more complicated. A number of special characteristics made it much easier to impose and maintain the lock-down in Hubei. The lock down did not affect the whole country and the relationship between the state and the people (to put it gently) made things easier.

What about the rest of the world then? It was not very easy to collect data for when exactly measures were taken in every country. There is variability in this respect too, as some countries have not gone into full lock down or did so only after milder measures were put into place. To keep things as homogeneous as possible, I obtained the dates when school closure was decided in 13 countries, from wikipedia articles on the Covid 19 epidemic. I then went back to the Confirmed case data and calculated 5-day rolling doubling times for the last three weeks. This means that, starting from March 1st, I calculated average doubling time estimates for five consecutive days. (Note: this was done to get smoother estimates. It was also the approach I used for the data from China in the plot above).

The results are shown below for 11 European countries (including my native Greece), the US and Israel. I hope to have time to include more in the future, (provided I find time to read wikipedia articles, or get a reliable comprehensive data source). The heatmap shows doubling times from March 1st to March 22nd (because of the 5-day averaging, the two final dates refer to a central date -2). Doubling times lie in a range of 1 to 12 with Iran being in the best situation. You basically look the plot from left to right and hope to be moving steadily (and as fast as possible) to darker colours.

What does it tell us?

1. First of all, things are encouraging for Italy. Its dynamics in increasing doubling time puts her in the same cluster with Sweden, Greece and the obvious outperformer of the plot that is Iran.
2. A second cluster containing France, Belgium, the Netherlands and Spain is seriously lagging behind but things are positive for France and the Netherlands that show a slowing down (darker colours as we move from left to right). Still too early to tell about Spain and Belgium.
3. The last cluster is quite inhomogeneous. Germany looks more like France than the UK or Portugal but it is probably the large number of cases that obscures its performance. The US are an obvious outlier. Things don't seem to be going anywhere better.

But remember we did all this to see if the measures on social isolation can actually work. Can we reproduce the effect they had on China? The answer is yes, but not entirely. There appears to be a mild positive association with how early the measures were taken and the development of the situation. Greece, Sweden, Iran and Italy all closed their schools on March 11th. This didn't happen in the UK before March 19th, more than a week after. On the other hand, the lag in school closure between Spain and Italy appears small to justify the dynamics of doubling times between them. Italy is getting better, Spain is not. But it could well be that two or even one day can really make a difference.

There are also a number of factors that we cannot easily account for. The red lines below show the date schools were shut down but that is not the only measure. Most countries have gone into a complete lock down, shutting down bars and restaurants and even imposing curfews. The time when this happened was not the same. In others, like the US, even the time schools closed was not the same across the country. As data accumulate and the situation develops we will be in a better position to discuss the efficiency of the lock-down and (perhaps more importantly) the time it would be OK to lift them.

Other kinds of data need to be factored in as well. Population densities and demographics may make lock downs more or less effective and some countries may need to considered additional (or other) means of mitigation. It sounds frustratingly repetitive but we can only wait and see.

0 Comments

Covid-19 Epidemic. Confirmed case doubling times among countries

3/21/2020

0 Comments

The Covid-19 pandemic is affecting our lives in every possible way. This is a first in many respects, one of which being that this is the first epidemic for which data regarding cases, deaths and recoveries are being made available in almost real time.
A number of data scientists have been trying to make sense of the available data from many possible angles. Nevertheless, little more than the obvious (and expected) exponential increases have come out of most of the analyses. We now know that the virus is highly contagious and that exponential growth of cases should be the norm, but the goal is to tame this growth as much as possible and in this sense social distancing and case isolation are likely to be the only way to delay the peak of the epidemic (that is, the time when the number of sick people reaches the maximum), or (as you should know by now) to "flatten the curve".

One main problem with most data analytical approaches is the variability of reported data in terms of cases. While the number of deaths cannot be questioned, the way each country reports confirmed cases is very different. Some countries (like South Korea) have opted for extensive testing in the general population, while others (like for instance Greece) have explicitly targeted serious/critical cases, recommending that people with mild symptoms avoid over-crowding hospitals and diagnostic labs. I shall refrain from discussing arguments that exist both for and against these two extreme approaches and focus on the main problem that this variability poses in the analysis of the epidemic dynamics.
If some countries test a lot, the number of cases will be high but the case fatality rate (i.e. the number of deaths per active cases) will be smaller. Countries that test only people with severe symptoms will report low number of cases but higher fatality rates. In any case, it is difficult to tell how the spread differs from one country to the other.

How then can we really know what is going on?
One solution is to focus on increase rates instead of number of cases. Assuming that the way countries report cases doesn't change over time we can try to estimate the rate of increase in confirmed cases from one day to the other. Regardless whether one tests a lot or little, the number of new cases against the previous sum is representative of the spread. This is not something new and people have tried to figure out this rate from the slopes of log-linear fits, but the problem is that these slopes are very prone to random fluctuations especially when case numbers are small.

In the following I will present a simple approach to address the problem, and more importantly, to gauge into differences in the approaches that different countries employ to tackle the spread of the epidemic.

Data
I used data from Johns Hopkins University, Center for Systems Science and Engineering (CSSE), which are daily updated for all countries that have reported at least one confirmed case and which are freely accessible here: https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data.

Case Doubling Time
Instead of looking into slopes of increase curves, I tried to estimate Case Doubling Times. This is the time (in days) it takes for the number of cases to double, given the rate of increase over a certain period of time. This means that if we have N cases on day:x you can estimate the number of days it will take to reach 2N cases, assuming a constant increase rate r.
This is easily calculated from the following equation: Nr^(dt)=2N, where dt is the doubling time in days. Solving the equation for dt gives you: dt=1/log2(r), which means that a good estimate for r is equivalent to a reliable prediction of doubling (obvious).

What I did
1. I took the timeseries data (that is, cases per day for many days in a row) from the link above and estimated a rolling increase rate for every country that has reported at least 300 cases. Rolling, in this sense, means that starting from the day that each country reported the first case, I calculated the mean rate of increase between two consecutive days until today (March 21st).
2. As this gives a series of increase rates that is equal to the number of days of reporting (minus 1) it is, expectedly, very noisy, especially as the number of cases is still in the lows. You expect it to converge once sufficient cases have been reported (and then you hope it slowly drops to zero). This is why I used only the mean rolling rate of the last 5 days for each country. I then used this mean rate to estimate the Doubling Time explained above.
3. Even thus, the rate values may vary significantly in countries, where cases have been reported for a short period and this is why I combined the mean rate with its standard deviation (how much it varied over the last 5 days) and the days since the first case was reported.
4. I then plotted the estimated Doubling Times against the total number of cases, taking into account the variability of increase rates. Each country is represented by a dot. Doubling time is on the y-axis and you want this to be as high as possible. This means you have efficiently "flattened the curve". Total cases is on the x-axis and this means, well the obvious, that a lot of people are sick (but caution: not all countries report in the same way). How reliable are the data? The darker the color of the label, the smaller the standard deviation of the increase rate, thus the more confident we can be of the doubling time we estimate.

Conclusions?
So, what can we make of this figure?

Japan is the way to go
First of all, one should focus on the dark blue labels. These represent countries with low standard deviation of increase rates and thus reliable estimates of doubling times. That said, you want to be as high on the y-axis as possible. We knew already that Japan is doing a great job. Even though they were among the first countries to report a case, they have kept cases low (~1000) and -more importantly- with a doubling time of 25 days. This means that, provided things don't change, Japan is not expected to have more than 3000 cases by the end of April.

Good signs from Italy
Italy has, on the other hand, been described as the horror story so far. An exploding number of cases, coupled with a high fatality, both of which are probably due to strong exponential increase rates in the early days after the outbreak. Nevertheless, Italy's doubling time is now 5.4 days, almost double that of Germany (2.8 days) and more than double that of the US (a little over 2 days). Whether this is due to the heavy restrictions on movement and social distancing that were imposed a bit more than a week ago remains to be seen. Even though dt=5.4 is not perfect (if stable it means an 60-fold increase within the period of one month), Italy's doubling time was <3d, a few days ago and was less than 2 days in the early days of the epidemic, which means that significant progress is being made in stopping the spread. One possibility is that Italy has simply reached a saturation point in terms of testing capacity and can now only perform a certain number of test everyday, most of which come out positive. This may also be the case of Iran, one of the countries that suffered mostly but which now reports low number of cases and doubling time of more than 10 days. This will be made clearer in the days to come.

Central Europe should look towards Scandinavia
What about the rest? Bad news for the US, Germany, the UK and Austria, for all of which doubling times are estimated to be on the lows. The Netherlands, Belgium and Switzerland are doing a bit better. The Scandinavian countries are performing rather well and in spite great numbers of cases (taking their population into account) they seem to have slowed down the spread. Not much can be said for countries like Greece, Iceland or Slovenia where the spread looks to be halted but the number of cases is still low to allow for accurate estimates. Countries in the bottom left part of the plot have very low number of cases and great standard deviations for increase rates. It is just too early tell.

What more can we look into?
These are highly volatile data and so, one needs to keep looking for more as the timeseries become longer and thus rolling estimates of increase rates converge. It would be very interesting to look into how doubling times changed for each country taking into account the sort of counter-measures imposed, and, perhaps more importantly, how early after the first case they were put into place. This will probably give us a better idea on what the best approach is. Everybody agrees we are on uncharted waters and need to approach every analysis and its interpretation with a lot of caution.

0 Comments

Covid19 Epidemic. Doubling times responding to lock-downs

Covid-19 Epidemic. Confirmed case doubling times among countries

It's all about...

Archives

Categories