Predictive Hacks

# How to Convert Continuous variables into Categorical by Creating Bins

A very common task in data processing is the transformation of the numeric variables (continuous, discrete etc) to categorical by creating bins. For example, is quite ofter to convert the `age` to the `age group`. Let’s see how we can easily do that in R.

We will consider a random variable from the Poisson distribution with parameter λ=20

```library(dplyr)
# Generate 1000 observations from the Poisson distribution
# with lambda equal to 20
df<-data.frame(MyContinuous = rpois(1000,20))

# get the histogtam
hist(df\$MyContinuous)

```

## Create specific Bins

Let’s say that you want to create the following bins:

• Bin 1: (-inf, 15]
• Bin 2: (15,25]
• Bin 3: (25, inf)

We can easily do that using the `cut` command. Let’s start:

```df<-df%>%mutate(MySpecificBins = cut(MyContinuous, breaks = c(-Inf,15,25,Inf)))

```

Let’s have a look at the counts of each bin.

```df%>%group_by(MySpecificBins)%>%count()

```

Notice that you can define also you own labels within the `cut` function.

## Create Bins based on Quantiles

Let’s say that you want each bin to have the same number of observations, like for example 4 bins of an equal number of observations, i.e. 25% each. We can easily do it as follows:

```numbers_of_bins = 4

df<-df%>%mutate(MyQuantileBins = cut(MyContinuous,
breaks = unique(quantile(MyContinuous,probs=seq.int(0,1, by=1/numbers_of_bins))),
include.lowest=TRUE))

```

We can check the `MyQuantileBins` if contain the same number of observations, and also to look at their ranges:

```df%>%group_by(MyQuantileBins)%>%count()

```

Notice that in case that you want to split your continuous variable into bins of equal size you can also use the `ntile` function of the `dplyr` package, but it does not create labels of the bins based on the ranges.

## Want to Build Bins in Python?

Do you want to create bins in Python? You can have a look at our post!

### 6 thoughts on “How to Convert Continuous variables into Categorical by Creating Bins”

1. Hi
Thanks for the post
I am getting an error when I tried to use this command

Error: Problem with `mutate()` input `gest_bins`.
x missing value where TRUE/FALSE needed
ℹ Input `gest_bins` is `cut(gest = cut(-Inf, 28, 35, Inf))`.
Run `rlang::last_error()` to see where the error occurred.

Is this due to missing values and in that case, how to solve that?

• Please provide me a reproducible example

2. The issue is how best to decide the number of bins and their width. Check out how cartographers have handled the same issue when choropleth mapping in a brilliant paper:
The Selection of Class Intervals
Author(s): Ian S. Evans
Transactions of the Institute of British Geographers, New Series, Vol. 2, No. 1,Contemporary Cartography (1977), http://www.jstor.org/stable/622195 .

3. This is a nice little post. It is similar to the one I did in SAS (not R a while ago..

### Get updates and learn from the best

Python

#### Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

#### Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s