When we are dealing with large datasets and there is a need to calculate some values like the row/column min/max/rank/mean
etc we should avoid the apply
function because it takes a lot of time. Instead, we can use the matrixStats package and its corresponding functions. Let’s provide some comparisons.
Example of Minimum value per Row
Assume that we want to get the minimum value of each row from a 500 x 500
matrix. Let’s compare the performance of the apply
function from the base
package versus the rowMins
function from the matrixStats
package.
library(matrixStats) library(microbenchmark) library(ggplot2) x <- matrix( rnorm(5000 * 5000), ncol = 5000 ) tm <- microbenchmark(apply(x,1,min), rowMins(x), times = 100L ) tm
Unit: milliseconds
expr min lq mean median uq max neval
apply(x, 1, min) 981.6283 1034.98050 1078.04485 1065.4163 1107.9962 1327.9284 100
rowMins(x) 42.1838 43.80065 46.55752 45.2255 47.6249 81.3097 100
As we can see from the output above, the apply
function was 23 times slower than the rowMins
. Below we represent the violin plot
autoplot(tm)