Predictive Hacks

Pairwise Distance and Similarity

An efficient way to get the pairwise Similarity of a numpy array (or a pandas data frame) is to use the pdist and squareform functions from the scipy package. Let’s start working with a practical example by taking into consideration the Jaccard similarity:

import numpy as np
from scipy.spatial.distance import pdist, squareform

my_data = np.array([[1,1,1,0,1],


Now we are going to calculate the pairwise Jaccard distance:

# Calculate all pairwise distances
jaccard_distances = pdist(my_data, metric='jaccard')

# Convert the distances to a square matrix
jaccard_distances = squareform(jaccard_distances)

Finally, the Jaccard Similarity = 1- Jaccard Distance.

jaccard_similarity = 1-jaccard_distances

As we can see, the final outcome is a 4×4 array. Note that the number of documents was 4 and that is why we got a 4×4 similarity matrix.

Note that the scipy.spatial.distance supports many distances such as:

‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulsinski’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.

Pairwise Distance with Scikit-Learn

Alternatively, you can work with Scikit-learn as follows:

import numpy as np
from sklearn.metrics import pairwise_distances

# get the pairwise Jaccard Similarity
1-pairwise_distances(my_data, metric='jaccard')

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore


Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.


Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s