Predictive Hacks

# Pairwise Distance and Similarity

An efficient way to get the pairwise Similarity of a numpy array (or a pandas data frame) is to use the `pdist` and `squareform` functions from the `scipy` package. Let’s start working with a practical example by taking into consideration the Jaccard similarity:

```import numpy as np
from scipy.spatial.distance import pdist, squareform

my_data = np.array([[1,1,1,0,1],
[1,1,0,0,1],
[0,0,1,1,1],
[1,0,1,1,0]])

my_data
```

Now we are going to calculate the pairwise Jaccard distance:

```# Calculate all pairwise distances
jaccard_distances = pdist(my_data, metric='jaccard')

# Convert the distances to a square matrix
jaccard_distances = squareform(jaccard_distances)
```

Finally, the Jaccard Similarity = 1- Jaccard Distance.

```jaccard_similarity = 1-jaccard_distances
jaccard_similarity
```

As we can see, the final outcome is a 4×4 array. Note that the number of documents was 4 and that is why we got a 4×4 similarity matrix.

Note that the scipy.spatial.distance supports many distances such as:

‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulsinski’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.

## Pairwise Distance with Scikit-Learn

Alternatively, you can work with Scikit-learn as follows:

```import numpy as np
from sklearn.metrics import pairwise_distances

# get the pairwise Jaccard Similarity
1-pairwise_distances(my_data, metric='jaccard')
```

### Get updates and learn from the best

Python

#### Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

#### Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s