Predictive Hacks

# How to run Recommender Systems in Python

## A Brief Introduction to Recommender Systems

Nowadays, almost every company applies Recommender Systems (RecSys) which is a subclass of information filtering system that seeks to predict the “rating” or “preference” a user would give to an item. They are primarily used in commercial applications. Just to give an example of some famous recommender systems:

• Amazon: Was the first company that applied Recommender Systems extensively around 1998. Based on the user’s preferences was suggesting similar products. It first applied with books and now with all of its products.
• youtube: Based on the videos that you have watched, it suggested other videos that are likely to like them.
• Spotify: Their successful Recommender System made them famous and many people let Spotify play music for them.
• Facebook: It shows on the top of the feed the posts are more likely to be of your interest.
• Netflix: It recommends movies for you based on your past ratings. It is worth mentioning the Netflix Prize, an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users or the films being identified except by numbers assigned for the contest. On September 21, 2009 they awarded the \$1M Grand Prize to team “BellKor’s Pragmatic Chaos”. So, you can build your own improved Recommender System and you can become rich one day 🙂

## Surprise for Recommender Systems

Still, there is much interest in Recommender Systems and a great field of research. Our goal here is to show how you can easily apply your Recommender System without explaining the maths below. We will work with the surprise package which is an easy-to-use Python scikit for recommender systems

The available prediction algorithms are:

## Build your own Recommender System

We will provide an example of how you can build your own recommender. We will work with the MovieLens dataset, collected by the GroupLens Research Project at the University of Minnesota.

Let’s get our hands dirty!

import pandas as pd
import numpy as np

columns = ['user_id', 'item_id', 'rating', 'timestamp']

columns = ['item_id', 'movie title', 'release date', 'video release date', 'IMDb URL', 'unknown', 'Action', 'Adventure',
'Animation', 'Childrens', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror',
'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']

movies = pd.read_csv('ml-100k/u.item', sep='|', names=columns, encoding='latin-1')
movie_names = movies[['item_id', 'movie title']]

combined_movies_data = pd.merge(df, movie_names, on='item_id')
combined_movies_data = combined_movies_data[['user_id','movie title', 'rating']]
combined_movies_data.head()

I will also provide my ratings for some movies from this data set since my ultimate goal is to get recommendations for myself ;). Below you can see my preferences. I will give for myself the user_id 1001.

# my user_id is the 1001
my_ratings 

The next step is to append my ratings to the rest ratings. Also, we will keep the movies which have at least 25 reviews

combined_movies_data = pd.concat([combined_movies_data, my_ratings], axis=0)

# rename the columns to userID, itemID and rating
combined_movies_data.columns = ['userID', 'itemID', 'rating']

# use the transform method group by userID and count to keep the movies with more than 25 reviews

combined_movies_data['reviews'] = combined_movies_data.groupby(['itemID'])['rating'].transform('count')

combined_movies_data= combined_movies_data[combined_movies_data.reviews>25][['userID', 'itemID', 'rating']]


Now we have ready our dataset and we can apply different recommender systems using the surprise package.

from surprise import NMF, SVD, SVDpp, KNNBasic, KNNWithMeans, KNNWithZScore, CoClustering
from surprise.model_selection import cross_validate
from surprise import Reader, Dataset
# A reader is still needed but only the rating_scale param is requiered.
data = Dataset.load_from_df(combined_movies_data, reader)

Clearly, we want to remove the movies that I have rated from the suggested ones. Let’s remove the rated movies:

# get the list of the movie ids
unique_ids = combined_movies_data['itemID'].unique()

# get the list of the ids that the userid 1001 has rated
iids1001 = combined_movies_data.loc[combined_movies_data['userID']==1001, 'itemID']

# remove the rated movies for the recommendations
movies_to_predict = np.setdiff1d(unique_ids,iids1001)

### Recommender Systems using NMF

algo = NMF()
algo.fit(data.build_full_trainset())

my_recs = []
for iid in movies_to_predict:
my_recs.append((iid, algo.predict(uid=1001,iid=iid).est))

pd.DataFrame(my_recs, columns=['iid', 'predictions']).sort_values('predictions', ascending=False).head(10)

My recommendations according to NMF:

### Recommender Systems usingSVD

algo = SVD()
algo.fit(data.build_full_trainset())

my_recs = []
for iid in movies_to_predict:
my_recs.append((iid, algo.predict(uid=1001,iid=iid).est))

pd.DataFrame(my_recs, columns=['iid', 'predictions']).sort_values('predictions', ascending=False).head(10)

Recommender Systems using SVD

### Recommender Systems usingSVD++

algo = SVDpp()
algo.fit(data.build_full_trainset())

my_recs = []
for iid in movies_to_predict:
my_recs.append((iid, algo.predict(uid=1001,iid=iid).est))

pd.DataFrame(my_recs, columns=['iid', 'predictions']).sort_values('predictions', ascending=False).head(10)

Recommender Systems using SVD++

Recommender Systems using KNN with Z-Score

algo = KNNWithZScore()
algo.fit(data.build_full_trainset())

my_recs = []
for iid in movies_to_predict:
my_recs.append((iid, algo.predict(uid=1001,iid=iid).est))

pd.DataFrame(my_recs, columns=['iid', 'predictions']).sort_values('predictions', ascending=False).head(10)

Recommender Systems using KNN with Z-Score

### Recommender Systems usingCo-Clustering

algo = CoClustering()
algo.fit(data.build_full_trainset())

my_recs = []
for iid in movies_to_predict:
my_recs.append((iid, algo.predict(uid=1001,iid=iid).est))

pd.DataFrame(my_recs, columns=['iid', 'predictions']).sort_values('predictions', ascending=False).head(10)

Recommender Systems using Co-Clustering

## How to Evaluate the Recommender Systems

We saw earlier that each recommender algorithm suggested different movies. The question is which one performed best and how we can choose between different algorithms.

Like in all Machine Learning problems, we can split our dataset into train and test and evaluate the performance on the test dataset. We will apply Cross Validation (k-fold of k=3) and we will get the average RMSE of the 3-folds.

cv = []
# Iterate over all recommender system algorithms
for recsys in [NMF(), SVD(), SVDpp(), KNNWithZScore(), CoClustering()]:
# Perform cross validation
tmp = cross_validate(recsys, data, measures=['RMSE'], cv=3, verbose=False)
cv.append((str(recsys).split(' ')[0].split('.')[-1], tmp['test_rmse'].mean()))

pd.DataFrame(cv, columns=['RecSys', 'RMSE'])

Average RMSE on the Test Dataset

As we can see the SVD++ had the best performance (lowest RMSE)

## Discussion

We built several Recommender Systems where the RMSE was less than 1. For our models, we took into consideration only the UserID and the ItemID. This post explains briefly the logic of the item-based and user-based collaborative filtering. You can also find an example of item-based collaborative filtering. We can apply different algorithms by taking into account other attributes like the genre of the movie, the released date, the director, the actor, the budget, the duration and so on. In this case, we are referring to Content-based recommenders that treat recommendation as a user-specific classification problem and learn a classifier for the user’s likes and dislikes based on an item’s features. In this system, keywords are used to describe the items and a user profile is built to indicate the type of item this user likes. Finally, we can even take into consideration the user’s attributes, like gender, age, location, language, etc.

### Get updates and learn from the best

Miscellaneous

#### How to Redirect and Save Errors in Unix

In Unix, there are three types of redirection such as: Standard Input (stdin) that is denoted by 0. Usually, it’s

Python

#### Content-Based Recommender Systems with TensorFlow Recommenders

In this post, we will consider as a reference point the “Building deep retrieval models” tutorial from TensorFlow and we