A quick and efficient way to get the cardinality of columns, i.e. the number of unique values, is to run this line of code.
Let’s create a data frame:
import pandas as pd import numpy as np df = pd.DataFrame({'id':list(range(10)), 'A':[10,9,8,7,np.nan,np.nan,20,15,12,np.nan], 'B':["A","B","A","A",np.nan,"B","A","B",np.nan,"A"], 'C':[np.nan,"BB","CC","BB","BB","CC","AA","BB",np.nan,"AA"], 'D':[np.nan,20,18,22,18,17,19,np.nan,17,23]}) df
Let’s get the number of unique values by column.
# if you want to ignore NAs df.nunique(dropna=True)
id 10
A 7
B 2
C 3
D 6
dtype: int64
# if you want to count the as a different unique value the NA df.nunique(dropna=False)
id 10
A 8
B 3
C 4
D 7
dtype: int64