Predictive Hacks

# How to create Bins in Python using Pandas We will show how you can create bins in Pandas efficiently. Let’s assume that we have a numeric variable and we want to convert it to categorical by creating bins.

We will consider a random variable from the Poisson distribution with parameter λ=20

```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
s = np.random.poisson(20, 10000)

df = pd.DataFrame({'MyContinuous':s})

df
```

Let’s get the histogram as well.

```df.hist('MyContinuous', bins=10, figsize=(12,8))

```

## Create Specific Bins

Let’s say that you want to create the following bins:

• Bin 1: (-inf, 15]
• Bin 2: (15,25]
• Bin 3: (25, inf)

We can easily do that using `pandas`. Let’s start:

```bins = [-np.inf, 15, 25, np.inf]
df['MySpecificBins'] = pd.cut(df['MyContinuous'], bins)
df

```

Let’s have a look at the counts of each bin.

```df['MySpecificBins'].value_counts()

```
```(15.0, 25.0]    7341
(-inf, 15.0]    1552
(25.0, inf]     1107
Name: MySpecificBins, dtype: int64```

Notice that you can define also you own labels within the `cut` function.

## Create Bins based on Quantiles

Let’s say that you want each bin to have the same number of observations, like for example 4 bins of an equal number of observations, i.e. 25% each. We can easily do it as follows:

```df['MyQuantileBins'] =  pd.qcut(df['MyContinuous'], 4)

```

We can check the `MyQuantileBins` if contain the same number of observations, and also to look at their ranges:

```df['MyQuantileBins'].value_counts()

```
```(4.999, 17.0]    2996
(17.0, 20.0]     2628
(20.0, 23.0]     2239
(23.0, 39.0]     2137
Name: MyQuantileBins, dtype: int64
```

## Want to Build Bins in R?

Do you want to create bins in R? You can have a look at our post.

## More Data Science Hacks?

You can follow us on Medium for more Data Science Hacks

### 2 thoughts on “How to create Bins in Python using Pandas”

1. Can you share how to create bins for categorical variables such as interest rates
15%

• 