A very common task in data processing is the transformation of the numeric variables (continuous, discrete etc) to categorical by creating bins. For example, is quite ofter to convert the ** age** to the

**. Let’s see how we can easily do that in R.**

`age group`

We will consider a random variable from the Poisson distribution with parameter **λ=20**

library(dplyr) # Generate 1000 observations from the Poisson distribution # with lambda equal to 20 df<-data.frame(MyContinuous = rpois(1000,20)) # get the histogtam hist(df$MyContinuous)

## Create specific Bins

Let’s say that you want to create the following bins:

**Bin 1: (-inf, 15]****Bin 2: (15,25]****Bin 3: (25, inf)**

We can easily do that using the `cut`

command. Let’s start:

df<-df%>%mutate(MySpecificBins = cut(MyContinuous, breaks = c(-Inf,15,25,Inf))) head(df,10)

Let’s have a look at the counts of each bin.

df%>%group_by(MySpecificBins)%>%count()

**Notice **that you can define also you own labels within the `cut`

function.

## Create Bins based on Quantiles

Let’s say that you want each bin to have the same number of observations, like for example 4 bins of an equal number of observations, i.e. 25% each. We can easily do it as follows:

numbers_of_bins = 4 df<-df%>%mutate(MyQuantileBins = cut(MyContinuous, breaks = unique(quantile(MyContinuous,probs=seq.int(0,1, by=1/numbers_of_bins))), include.lowest=TRUE)) head(df,10)

We can check the `MyQuantileBins`

if contain the same number of observations, and also to look at their ranges:

df%>%group_by(MyQuantileBins)%>%count()

Notice that in case that you want to split your continuous variable into bins of equal size you can also use the `ntile`

function of the `dplyr`

package, but it does not create labels of the bins based on the ranges.

## Want to Build Bins in Python?

Do you want to create bins in Python? You can have a look at our post!

## 6 thoughts on “How to Convert Continuous variables into Categorical by Creating Bins”

Hi

Thanks for the post

I am getting an error when I tried to use this command

Error: Problem with `mutate()` input `gest_bins`.

x missing value where TRUE/FALSE needed

ℹ Input `gest_bins` is `cut(gest = cut(-Inf, 28, 35, Inf))`.

Run `rlang::last_error()` to see where the error occurred.

Is this due to missing values and in that case, how to solve that?

Please provide me a reproducible example

The issue is how best to decide the number of bins and their width. Check out how cartographers have handled the same issue when choropleth mapping in a brilliant paper:

The Selection of Class Intervals

Author(s): Ian S. Evans

Transactions of the Institute of British Geographers, New Series, Vol. 2, No. 1,Contemporary Cartography (1977), http://www.jstor.org/stable/622195 .

This is a nice little post. It is similar to the one I did in SAS (not R a while ago..