Predictive Hacks

Debug Models and Explain Predictions using Eli5

An important step when working with machine learning models is debugging. For example, when we are working with text we have to check if we have any noise in our features that affect the predictions like unwanted symbols or numbers. We have to know what is responsible for a prediction and somehow explain the model’s output. In the past, we talked about Feature Importances that can also help us debug a machine learning model but now there is an easier and more functional way to do this.

Eli5 is a library that can help us debug ML models and explain their outputs in a creative way. We will show you some examples using it with a simple classification problem, with text, and with an image classification problem using Keras.

Installation

pip install eli5

Eli5 explains IRIS Prediction

For this simple model, we will use the Iris dataset to predict the type of irises (Setosa, Versicolour, and Virginica).

from sklearn import datasets
from sklearn.linear_model import LogisticRegression
import pandas as pd

import eli5

iris = datasets.load_iris()
features=pd.DataFrame(iris['data'])
target=iris['target']
model=LogisticRegression(max_iter=1000)
model.fit(features,target)
 

Now that we have a trained model we can use Eli5 to get the feature importances and to explain a prediction by showing us what feature is responsible for the model’s output.

Firstly, let’s get the Feature importances for every class. In other words, the weights of the model.

eli5.explain_weights(model)
eli5 explains predictions

We can explain the output just by inputting the model and one test input.

eli5.explain_prediction(model, features.head(1))
eli5 explains predictions

In this prediction, class 0 has the highest probability. Also, we can see the contribution of every feature and Bias.

Eli5 explains Text Classification

We will use some sample positive and negative tweets and we will train a logistic regression classifier to predict if a tweet is positive or negative.

import numpy as np
import nltk   # Python library for NLP
from nltk.corpus import twitter_samples    # sample Twitter dataset from NLTK
from collections import Counter
import eli5

nltk.download('twitter_samples')
 
# select the set of positive and negative tweets
all_positive_tweets = twitter_samples.strings('positive_tweets.json')
all_negative_tweets = twitter_samples.strings('negative_tweets.json')

pos=pd.DataFrame({"tweet":all_positive_tweets,'positive':[1]*len(all_positive_tweets)})
neg=pd.DataFrame({"tweet":all_negative_tweets,'positive':[0]*len(all_negative_tweets)})

data=pd.concat([pos,neg])

from sklearn.feature_extraction.text import TfidfVectorizer


# use tfidf by removing tokens that don't appear in at least 5 documents
vect = TfidfVectorizer(min_df=5,ngram_range=(1, 3), stop_words='english')
 
# Fit and transform
X = vect.fit_transform(data.tweet)

from sklearn.linear_model import LogisticRegression

model=LogisticRegression()

model.fit(X,data['positive'])

Firstly let’s get the weights. In this case, we need also to set the vectorizer we have used.

eli5.show_weights(model, vec=vect)

Then the fun part. Let’s get the contribution of every word in an input sentence.

test="I'm glad this is not a sad tweet"

eli5.explain_prediction(model, test, vec=vect)
eli5 explains predictions

Pretty helpful, right? It gave us the contribution of every feature by highlighting them in the sentence.

Eli5 explains Keras image models

Eli5 is so powerful that can work also with Keras image classification models. We will use a pre-trained model to get the labels of the image below which is my desk.

eli5 explains predictions

from keras.applications.xception import Xception
from keras.preprocessing import image
from keras.applications.xception import preprocess_input, decode_predictions
import numpy as np
import tensorflow as tf

tf.compat.v1.disable_eager_execution()
import PIL
from PIL import Image
import requests
from io import BytesIO
 
# load the model
model = Xception(weights='imagenet', include_top=True)
 
    
    
    
# chose the URL image that you want
URL = "https://instagram.fath3-3.fna.fbcdn.net/v/t51.2885-15/e35/p1080x1080/120296207_346512619886025_2547830156221124067_n.jpg?_nc_ht=instagram.fath3-3.fna.fbcdn.net&_nc_cat=109&_nc_ohc=eivBrVMAy4oAX8SvZlu&edm=AGenrX8BAAAA&ccb=7-4&oh=9408a18468253ee1cf96dd93e98f132b&oe=60F341EE&_nc_sid=5eceaa"
# get the image
response = requests.get(URL)
img = Image.open(BytesIO(response.content))
# resize the image according to each model (see documentation of each model)
img = img.resize((299,299))
 
# convert to numpy array
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

Now that we have the processed image(x) and the Keras model let’s check the top 20 labels the model predicted.

features = model.predict(x)
 
# return the top 20 detected objects
label = decode_predictions(features, top=20)
label
[[('n03179701', 'desk', 0.296059),
  ('n03337140', 'file', 0.12352474),
  ('n04590129', 'window_shade', 0.078198865),
  ('n03180011', 'desktop_computer', 0.06828544),
  ('n04239074', 'sliding_door', 0.02761029),
  ('n03782006', 'monitor', 0.022889987),
  ('n02791124', 'barber_chair', 0.018023033),
  ('n02791270', 'barbershop', 0.013427197),
  ('n04344873', 'studio_couch', 0.011167441),
  ('n03201208', 'dining_table', 0.009128182)]]

Then, we can check what part of the image is responsible for every label. We will need also the class Ids of the labels which can be obtained as follows.

np.argsort(features)[0, ::-1][:10]
array([526, 553, 905, 527, 799, 664, 423, 424, 831, 532])

Let’s check the label desk with class id 526.

eli5.show_prediction(model, x, targets=[905])

As you can see, the most important part of the image for the label desk is where the actual desk is.

This time, let’s check a strange label like barbershop.

eli5.show_prediction(model, x, targets=[424])

Aha! Now we know that my desk chair is like a barber chair. That’s why the model predicted the barbershop label.

Summing it up

Eli5 is a very useful library that helps us debug classifiers and explain their predictions. It’s working with most of the python ML libraries and also with more complex models like Keras or when using Text and Vectorizers.

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s