Predictive Hacks

# Avoid apply() function in large datasets

When we are dealing with large datasets and there is a need to calculate some values like the `row/column min/max/rank/mean` etc we should avoid the `apply` function because it takes a lot of time. Instead, we can use the matrixStats package and its corresponding functions. Let’s provide some comparisons.

## Example of Minimum value per Row

Assume that we want to get the minimum value of each row from a `500 x 500` matrix. Let’s compare the performance of the `apply` function from the `base` package versus the `rowMins` function from the `matrixStats` package.

```library(matrixStats)
library(microbenchmark)
library(ggplot2)

x <- matrix( rnorm(5000 * 5000), ncol = 5000 )

tm <- microbenchmark(apply(x,1,min),
rowMins(x),
times = 100L
)

tm

```
``````Unit: milliseconds
expr      min         lq       mean    median        uq       max neval
apply(x, 1, min) 981.6283 1034.98050 1078.04485 1065.4163 1107.9962 1327.9284   100
rowMins(x)  42.1838   43.80065   46.55752   45.2255   47.6249   81.3097   100``````

As we can see from the output above, the `apply` function was 23 times slower than the `rowMins`. Below we represent the violin plot

```autoplot(tm)

```

### Get updates and learn from the best

Python

#### Creating Dynamic Forms with Streamlit: A Step-by-Step Guide

In this blog post, we’ll teach you how to create dynamic forms based on user input using Streamlit’s session state

Python

#### How to Connect Wikipedia with ChatGPT and LangChain

ChatGPT’s knowledge is limited to its training data, which has the cutoff year of 2021. This implies that we cannot