Predictive Hacks

# Cumulative Count Distinct Values

Sometimes there is a need to do a rolling count of the distinct values of a list/vector. In other words we want to add up only any new element that appears in our list/vector. Below is an example of how we can easily do it in R and Python.

## R

```# assume that this is our vector
x=c("e", "a","a","b","a","b","c", "d", "e")

# we apply the "cumsum(!duplicated(x))" command
data.frame(Vector=x,
CumDistinct=cumsum(!duplicated(x)))
```
``````  Vector CumDistinct
1      e           1
2      a           2
3      a           2
4      b           3
5      a           3
6      b           3
7      c           4
8      d           5
9      e           5``````

## Python

```import pandas as pd

df = pd.DataFrame({'mylist':["e", "a","a","b","a","b","c", "d", "e"]})
df['CumDistinct'] = (~df.mylist.duplicated()).cumsum()
df

# or by using apply
# df['CumDistinct'] = df.mylist.apply(lambda x: (~pd.Series(x).duplicated()).cumsum())
```
``````  mylist  CumDistinct
0      e            1
1      a            2
2      a            2
3      b            3
4      a            3
5      b            3
6      c            4
7      d            5
8      e            5
``````

Alternatively, we can use list comprehension as follows:

```df = pd.DataFrame({'mylist':["e", "a","a","b","a","b","c", "d", "e"]})

df['CumDistinct']=[len(set(df['mylist'][:i])) for i,j in enumerate(df['mylist'], 1)]

df
```
``````  mylist  CumDistinct
0      e            1
1      a            2
2      a            2
3      b            3
4      a            3
5      b            3
6      c            4
7      d            5
8      e            5``````

### 1 thought on “Cumulative Count Distinct Values”

1. How do we apply the above with a groupby?

### Get updates and learn from the best

Python

#### Creating Dynamic Forms with Streamlit: A Step-by-Step Guide

In this blog post, we’ll teach you how to create dynamic forms based on user input using Streamlit’s session state

Python

#### How to Connect Wikipedia with ChatGPT and LangChain

ChatGPT’s knowledge is limited to its training data, which has the cutoff year of 2021. This implies that we cannot