Predictive Hacks

Covid-19: Correlation Between Confirmed Cases and Deaths

correlation_covid

What is the daily correlation of Confirmed versus Death Cases in Covid-19. In other words, the people who have passed away, on average, how many days ago they have been reported (i.e. “Confirmed”) as Covid-19 new cases.

To answer this question, we can take the correlation between the Daily Confirmed vs Daily Deaths and trying different lag values of the confirmed cases, since the assumption is that it will take some days for someone to pass away since has been diagnosed with Covid-19.

The problem with the data is that are affected by the number of tests and also during some days like weekends they do not report all the cases. This implies that our analysis is not valid, but we will try to see what get. We will analyze Italy.

Italy: Correlation Between Confirmed Cases and Deaths

# https://github.com/RamiKrispin/coronavirus
devtools::install_github("RamiKrispin/coronavirus")
#checks if there is data update on the Github version
coronavirus::update_datasets(silence = TRUE)
 
library(coronavirus) 
library(tidyverse)
library(lubridate)
 
data("coronavirus") 


df<-coronavirus%>%filter(country=='Italy', date>='2020-02-15')%>%select(date, country, type, cases)%>%
  group_by(date, country, type) %>%pivot_wider(names_from =type, values_from=cases) %>%ungroup() 

correlations<-c()
lags<-c(0:20)

for (k in lags) {

  tmp<-df%>%mutate(lagk=lag(confirmed,k))%>%select(death,lagk)%>%na.omit()
  
  correlations<-c(correlations,cor(tmp$death, tmp$lagk))
}

data.frame(lags, correlations)

data.frame(lags, correlations)%>%ggplot(aes(x=lags, y=correlations))+geom_point()
 
 

correlation covide

lagscorrelations
092.64%
193.78%
294.44%
394.79%
494.92%
595.16%
695.53%
794.35%
892.58%
991.00%
1089.00%
1186.64%
1285.07%
1383.09%
1479.59%
1576.00%
1673.26%
1769.52%
1866.85%
1964.60%
2061.73%

As we see, the argmax correlation is at k=6, which implies (if the data were accurate), that from the people who have passed away, most of them diagnosed with Covid-19 6 days ago.


Italy: Correlation Between Confirmed Cases and Deaths SMA 5

Let’s do the same analysis, but this time by taking into consideration the Simple Moving Average of 5 days.

df<-coronavirus%>%filter(country=='Italy', date>='2020-02-15')%>%select(date, country, type, cases)%>%
  group_by(date, country, type) %>%pivot_wider(names_from =type, values_from=cases) %>%ungroup()%>%
  mutate(confirmed = stats::filter(confirmed, rep(1 / 5, 5), sides = 1), death = stats::filter(death, rep(1 / 5, 5), sides = 1))%>%na.omit() 

correlations<-c()
lags<-c(0:20)

for (k in lags) {
  
  tmp<-df%>%mutate(lagk=lag(confirmed,k))%>%select(death,lagk)%>%na.omit()
  
  correlations<-c(correlations,cor(tmp$death, tmp$lagk))
}

data.frame(lags, correlations)
data.frame(lags, correlations)%>%ggplot(aes(x=lags, y=correlations))+geom_point()
 
Covid-19: Correlation Between Confirmed Cases and Deaths 1
lagscorrelations
095.00%
196.36%
297.32%
397.90%
498.13%
598.04%
697.62%
796.77%
895.50%
993.91%
1092.00%
1189.80%
1287.41%
1384.77%
1481.85%
1578.77%
1675.63%
1772.39%
1869.15%
1965.94%
2062.65%

When we consider the SMA of 5 days the maximum correlation is at day 4.


Belgium: Correlation Between Confirmed Cases and Deaths

Let’s do the same analysis for Belgium.

df<-coronavirus%>%filter(country=='Belgium', date>='2020-02-15')%>%select(date, country, type, cases)%>%
  group_by(date, country, type) %>%pivot_wider(names_from =type, values_from=cases) %>%ungroup() 

correlations<-c()
lags<-c(0:20)

for (k in lags) {
  
  tmp<-df%>%mutate(lagk=lag(confirmed,k))%>%select(death,lagk)%>%na.omit()
  
  correlations<-c(correlations,cor(tmp$death, tmp$lagk))
}

data.frame(lags, correlations)


data.frame(lags, correlations)%>%ggplot(aes(x=lags, y=correlations))+geom_point()
 
Covid-19: Correlation Between Confirmed Cases and Deaths 2
lagscorrelations
00.703768
10.738962
20.722847
30.752669
40.749367
50.75888
60.775802
70.766534
80.741903
90.745851
100.739051
110.711148
120.745839
130.714
140.677464
150.629853
160.606283
170.549728
180.538276
190.522196
200.47582

Again, in Belgium, the highest correlation between Confirmed cases and Deaths, occurs after 6 days that people have been reported as new cases.

Finally, let’s run the same analysis by taking into consideration the SMA 5.


df<-coronavirus%>%filter(country=='Belgium', date>='2020-02-15')%>%select(date, country, type, cases)%>%
  group_by(date, country, type) %>%pivot_wider(names_from =type, values_from=cases) %>%ungroup()%>%
  mutate(confirmed = stats::filter(confirmed, rep(1 / 5, 5), sides = 1), death = stats::filter(death, rep(1 / 5, 5), sides = 1))%>%na.omit() 

correlations<-c()
lags<-c(0:20)

for (k in lags) {
  
  tmp<-df%>%mutate(lagk=lag(confirmed,k))%>%select(death,lagk)%>%na.omit()
  
  correlations<-c(correlations,cor(tmp$death, tmp$lagk))
}

data.frame(lags, correlations)
data.frame(lags, correlations)%>%ggplot(aes(x=lags, y=correlations))+geom_point()
 

belgium covid

lagscorrelations
081.53%
183.34%
284.61%
385.66%
486.43%
586.96%
687.18%
786.98%
886.45%
985.77%
1084.80%
1183.42%
1281.88%
1379.58%
1476.65%
1573.26%
1669.58%
1765.72%
1861.88%
1957.94%
2053.72%

Again, the maximum correlation observed on the 6th day.

Discussion

I would like to stress out that this analysis is not valid because we lack much of the information about the way of collecting and reporting the data. However, it is clear that there is a lag between the Confirmed cases and Deaths but we cannot specify the number accurately.

Related Posts

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

7 thoughts on “Covid-19: Correlation Between Confirmed Cases and Deaths”

  1. I tried your model for other countries, Greece for example, and received the following message on version 4.0.2:

    Error in coronavirus %>% filter(country == “Italy”, date >= “2020-02-15”) %>% :
    could not find function “%>%”

    Can you please help?

    Reply
  2. I couldn’t load the tidyverse package. All I received was:

    Error in library(tidyverse) : there is no package called ‘tidyverse’

    Reply
  3. I think I loaded all 3 packages now, but I am getting this message:

    Error in coronavirus %>% filter(country == “Italy”, date >= “2020-02-15”) %>% :
    could not find function “%>%”

    Reply

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

data science journey
Miscellaneous

My Journey as a Data Science Blogger

Μy Background My Studies Back in 2001, I entered university to study Statistics. During my first year, I ran my