Predictive Hacks

Hierarchical Clustering of Countries based on Eurovision Votes

dendrogtam

Description

This dataset contains the votes From Country to To Country for Eurovision 2016. There are the Jury Votes and the Televote. We would like to see how people voted in Eurovision 2016 and for that reason, we will consider only the Televote. Our ultimate goal is to create a dendrogram that will show the relationship between countries. The algorithm will be the Hierarchical Clustering.



Data Processing

We will load the data and we will keep only three columns such as the From Country, To Country and the Televote Rank. Then we will reshape the data where the rows will be the From Country ,the columns will the To Country and the values will be the Televote Rank. Notice that each country cannot vote itself and for that reason will be NA values. We will impute the NAs with the \(Televote Rank=1\) assuming that each country would have given the highest score to itself if that was allowed. Bear in mind that we want to cluster the countries based on their vote preferences.

from scipy.cluster.hierarchy import linkage, dendrogram
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.cluster.vq import whiten
%matplotlib inline 

 
eurovision = pd.read_csv("eurovision-2016.csv")
televote_Rank = eurovision.pivot(index='From country', columns='To country', values='Televote Rank')
# fill NAs with 1
televote_Rank.fillna(1, inplace=True)
 

Hierarchical Clustering

Since we have the data in the right format, we can whiten them although is not necessary since all features come from the same distribution and we are ready to run the Hierarchical Clustering and to represent the dendrogram. Notice, that the rows names are the From Country column.

df_scaled = whiten(televote_Rank.to_numpy())
# Calculate the linkage: mergings
mergings = linkage(df_scaled, method='ward')


plt.figure(figsize=(20,12))

# Plot the dendrogram
dn = dendrogram(
    mergings,
    labels=np.array(televote_Rank.index),
    leaf_rotation=90,
    leaf_font_size=14
)
plt.show()
 

Focusing on the Dendrogram

Let’s have a close look at the dendrogram. You will notice that the following countries appear to be close:

  • Bosnia & Herzegovina, Croatia, Montenegro, Serbia, F.Y.R. Macedonia, Slovenia

Also, you will notice that the Baltic Counties like Latvia, Lithuania, Estonia are close. Some other countries which are close:

  • Germany and Austria mainly and then Switzerland can be one group
  • Ireland is close to the United Kingdom
  • Finland, Sweden, Iceland, Denmark, Norway can be another group
  • Belgium is close to the Netherlands
  • Greece is close to Italy, Cyprus and Boulgaria

Apart from defining how close are some countries, we can say how “far” they are in terms of voting. For example, Switzerland is far away from Albania.

Conclusion

We took into consideration only the results of Eurovision 2016 so we cannot drive safe conclusions. However is clear that there are many factors that affect how people are voted in Eurovision. Generally, people tend to vote for countries which are close geographically or culturally.

Want to learn more?

If you found this post helpful, you can have a look at other related posts:

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

1 thought on “Hierarchical Clustering of Countries based on Eurovision Votes”

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s