Predictive Hacks

# Automated Machine Learning Model Testing

We have all been in this situation that we didn’t know which model is optimum for our ML project and most likely we were trying and evaluating many ML models just to see their behavior in our data. However, this is not a simple task and requires time and effort.

Fortunately, we can do this with only a few lines of code using LazyPredict. it will run more than 20 different ML models and return their performance statistics.

## Installation

pip install lazypredict


## Example

Let’s see an example using the Titanic dataset from Kaggle.

import pandas as pd
import numpy as np
from lazypredict.Supervised import LazyClassifier, LazyRegressor
from sklearn.model_selection import train_test_split



Here, we will try to predict if a passenger survived the Titanic so we have a classification problem.

Lazypredict can also do basic data preprocessing like fill NA values, create dummy variables, etc. That means that we can test the models immediately after reading the data and without getting any errors. However, we can use our preprocessed data so the model testing will be more accurate as it will be closer to our final models.

For this example, we will not do any preprocessing and let the Lazypredict do all the work.

#we are selecting the following columns as features for our models
X=data[['Pclass', 'Sex', 'Age', 'SibSp',
'Parch', 'Fare', 'Embarked']]

y=data['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=7)

# Fit LazyRegressor
reg = LazyClassifier(ignore_warnings=True, random_state=7, verbose=False)

#we have to pass the train and test dataset so it can evaluate the models
models, predictions = reg.fit(X_train, X_test, y_train, y_test)  # pass all sets

models


As you can see, it will return a data frame that contains the models and their statistics. We can see that Tree-Based models are performing better than the others. Knowing this, we can use Tree-based models in our approach.

You can get the complete pipeline and the models parameters used from Lazypredict as follows.

#we will get the pipeline of LGBMClassifier
reg.models['LGBMClassifier']

Pipeline(steps=[('preprocessor',
ColumnTransformer(transformers=[('numeric',
Pipeline(steps=[('imputer',
SimpleImputer()),
('scaler',
StandardScaler())]),
Index(['Pclass', 'Age', 'SibSp', 'Parch', 'Fare'], dtype='object')),
('categorical_low',
Pipeline(steps=[('imputer',
SimpleImputer(fill_value='missing',
strategy='constant')),
('encoding',
OneHotEncoder(handle_unknown='ignore',
sparse=False))]),
Index(['Sex', 'Embarked'], dtype='object')),
('categorical_high',
Pipeline(steps=[('imputer',
SimpleImputer(fill_value='missing',
strategy='constant')),
('encoding',
OrdinalEncoder())]),
Index([], dtype='object'))])),
('classifier', LGBMClassifier(random_state=7))])

Also, you can use the complete model pipeline for prediction.

reg.models['LGBMClassifier'].predict(X_test)

array([0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0,
0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0,
0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
0, 0, 1], dtype=int64)

In the same way as LazyClassifier, we can use LazyRegressor to test models for regression problems.

## Summing it up

Lazypredicty can help us have a basic understanding of which model is performing better in our data. It can be run nearly without any data preprocessing so we can test models immediately after reading the data.

It is worth noting that there are more ways to do automated machine learning model testing like using auto-sklearn but it is very complex to install it, especially in windows.

### Get updates and learn from the best

Miscellaneous

#### How to Redirect and Save Errors in Unix

In Unix, there are three types of redirection such as: Standard Input (stdin) that is denoted by 0. Usually, it’s

Python

#### Content-Based Recommender Systems with TensorFlow Recommenders

In this post, we will consider as a reference point the “Building deep retrieval models” tutorial from TensorFlow and we