Predictive Hacks

How to get the Cardinality of Columns in Pandas

A quick and efficient way to get the cardinality of columns, i.e. the number of unique values, is to run this line of code.

Let’s create a data frame:

import pandas as pd
import numpy as np
 
df = pd.DataFrame({'id':list(range(10)),
                   'A':[10,9,8,7,np.nan,np.nan,20,15,12,np.nan], 
                   'B':["A","B","A","A",np.nan,"B","A","B",np.nan,"A"],
                   'C':[np.nan,"BB","CC","BB","BB","CC","AA","BB",np.nan,"AA"],
                   'D':[np.nan,20,18,22,18,17,19,np.nan,17,23]})
df
 

Let’s get the number of unique values by column.

# if you want to ignore NAs
df.nunique(dropna=True)

id    10
A      7
B      2
C      3
D      6
dtype: int64
# if you want to count the as a different unique value the NA
df.nunique(dropna=False)
 
id    10
A      8
B      3
C      4
D      7
dtype: int64

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.