Predictive Hacks

The fastest way to Read and Write files in R

read times

Compare Read and Write files time

When we are dealing with large datasets, and we need to write many csv files or when the csv filethat we hand to read is huge, then the speed of the read and write command is important. We will compare the required time to write and read files of the following cases:

Compare the Write times

We will work with a csv file of 1M rows and 10 columns which is approximately 180MB. Let’s create the sample data frame and write it to the hard disk. We will generate 10M observations from the Normal Distribution

library(data.table)
library(readr)
library(microbenchmark)
library(ggplot2)

# create a 1M X 10 data frame

my_df<-data.frame(matrix(rnorm(1000000*10), 1000000,10))


# base
system.time({ write.csv(my_df, "base.csv", row.names=FALSE) })

# data.table
system.time({ fwrite(my_df, "datatable.csv") })

# readr
system.time({ write_csv(my_df, "readr.csv") })
 



The fastest way to Read and Write files in R 1

As we can see from the elapsed time, the fwrite from the data.table is ~70 times faster than the base package and ~7times faster than the readr


Compare the Read Times

Let’s compare also the read times using the microbenchmark package.

tm <- microbenchmark(read.csv("datatable.csv"),
                     fread("datatable.csv"),
                     read_csv("datatable.csv"),
                     times = 10L
)

tm
autoplot(tm)
 
 

The fastest way to Read and Write files in R 2

read times
read times

As we can see, again the fread from the data.table package is around 40 times faster than the base package and 8.5 times faster than the read_csv from the readr package.

Conclusion

If you want to read and write files fastly then you should choose the data.table package.

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

2 thoughts on “The fastest way to Read and Write files in R”

  1. Have you tried this with smaller data frames (say 10000 entries)? Is it still worth moving from read.csv to fread in this case?

    I know, potential gain is small, but sometimes I have to load dozens of such small tables, so every (micro)second matters…

    Reply

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore