Predictive Hacks

How To Report The Distribution Of Attributes Per Cluster

Photo by Ben Duchac on Unsplash

This is a very useful add-on report for a Clustering project. You will get the distribution of the Attributes per Cluster in a summarised Pandas Dataframe.

Generate Data

Let’s assume that we came up with 4 clusters such as “0, 1, 2 and 3” and that we have 2 attributes such as:

Age: [<30], [30-65], [65+]
Gender: f, m

import pandas as pd
import numpy as np
df=pd.DataFrame(
{
'Clusters':np.random.choice(["0","1",'2','3'],200,p=[0.3,0.2,0.2,0.3]),
'Gender':np.random.choice(["m","f"],200,p=[0.6,0.4]),
'Age':np.random.choice(["[<30]","[30-65]", "[65+]"],200,p=[0.3,0.6,0.1]),
"Response":np.random.binomial(1,size=200,p=0.2)
    }
)
df=df.reset_index().rename(columns={'index':'id'})
df.head()
   id Clusters Gender      Age
0   0        0      f  [30-65]
1   1        3      m  [30-65]
2   2        3      f    [65+]
3   3        3      m    [<30]
4   4        0      m  [30-65]

Report the Distribution of Attributes

features=['Gender','Age']
dist=pd.DataFrame()
for i in features:
    print(i)
    x=df.groupby(['Clusters',i])['id'].nunique().reset_index()
    x=x.pivot_table(columns='Clusters',index=i,values='id')
    x=x.apply(lambda x:x/x.sum(),axis=1)
    x['feature']=i
    x=x.reset_index().rename(columns={i:'value'})[['feature','value','0','1','2','3']]
    dist=dist.append(x)

dist
Clusters feature    value         0         1         2         3
0         Gender        f  0.246753  0.220779  0.142857  0.389610
1         Gender        m  0.292683  0.170732  0.227642  0.308943
0            Age  [30-65]  0.225225  0.198198  0.225225  0.351351
1            Age    [65+]  0.416667  0.250000  0.125000  0.208333
2            Age    [<30]  0.307692  0.153846  0.169231  0.369231

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s