We will show how you can create bins in Pandas efficiently. Let’s assume that we have a numeric variable and we want to convert it to categorical by creating bins.

We will consider a random variable from the Poisson distribution with parameter **λ=20**

import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline s = np.random.poisson(20, 10000) df = pd.DataFrame({'MyContinuous':s}) df

Let’s get the histogram as well.

df.hist('MyContinuous', bins=10, figsize=(12,8))

## Create Specific Bins

Let’s say that you want to create the following bins:

**Bin 1: (-inf, 15]****Bin 2: (15,25]****Bin 3: (25, inf)**

We can easily do that using `pandas`

. Let’s start:

bins = [-np.inf, 15, 25, np.inf] df['MySpecificBins'] = pd.cut(df['MyContinuous'], bins) df

Let’s have a look at the counts of each bin.

df['MySpecificBins'].value_counts()

(15.0, 25.0] 7341 (-inf, 15.0] 1552 (25.0, inf] 1107 Name: MySpecificBins, dtype: int64

**Notice **that you can define also you own labels within the `cut`

function.

## Create Bins based on Quantiles

Let’s say that you want each bin to have the same number of observations, like for example **4 bins** of an equal number of observations, i.e. 25% each. We can easily do it as follows:

df['MyQuantileBins'] = pd.qcut(df['MyContinuous'], 4) df[['MyContinuous', 'MyQuantileBins']].head()

We can check the `MyQuantileBins`

if contain the same number of observations, and also to look at their ranges:

df['MyQuantileBins'].value_counts()

(4.999, 17.0] 2996 (17.0, 20.0] 2628 (20.0, 23.0] 2239 (23.0, 39.0] 2137 Name: MyQuantileBins, dtype: int64

## Want to Build Bins in R?

Do you want to create bins in R? You can have a look at our post