Sometimes there is a need to do a rolling count of the distinct values of a list/vector. In other words we want to add up only any **new **element that appears in our list/vector. Below is an example of how we can easily do it in R and Python.

## R

# assume that this is our vector x=c("e", "a","a","b","a","b","c", "d", "e") # we apply the "cumsum(!duplicated(x))" command data.frame(Vector=x, CumDistinct=cumsum(!duplicated(x)))

```
Vector CumDistinct
1 e 1
2 a 2
3 a 2
4 b 3
5 a 3
6 b 3
7 c 4
8 d 5
9 e 5
```

## Python

import pandas as pd df = pd.DataFrame({'mylist':["e", "a","a","b","a","b","c", "d", "e"]}) df['CumDistinct'] = (~df.mylist.duplicated()).cumsum() df # or by using apply # df['CumDistinct'] = df.mylist.apply(lambda x: (~pd.Series(x).duplicated()).cumsum())

```
mylist CumDistinct
0 e 1
1 a 2
2 a 2
3 b 3
4 a 3
5 b 3
6 c 4
7 d 5
8 e 5
```

Alternatively, we can use list comprehension as follows:

df = pd.DataFrame({'mylist':["e", "a","a","b","a","b","c", "d", "e"]}) df['CumDistinct']=[len(set(df['mylist'][:i])) for i,j in enumerate(df['mylist'], 1)] df

```
mylist CumDistinct
0 e 1
1 a 2
2 a 2
3 b 3
4 a 3
5 b 3
6 c 4
7 d 5
8 e 5
```

## 1 thought on “Cumulative Count Distinct Values”

How do we apply the above with a groupby?