Predictive Hacks

How to Convert Continuous variables into Categorical by Creating Bins

bins

A very common task in data processing is the transformation of the numeric variables (continuous, discrete etc) to categorical by creating bins. For example, is quite ofter to convert the age to the age group. Let’s see how we can easily do that in R.

We will consider a random variable from the Poisson distribution with parameter λ=20

library(dplyr)
# Generate 1000 observations from the Poisson distribution 
# with lambda equal to 20
df<-data.frame(MyContinuous = rpois(1000,20))

# get the histogtam
hist(df$MyContinuous)
  

Create specific Bins

Let’s say that you want to create the following bins:

  • Bin 1: (-inf, 15]
  • Bin 2: (15,25]
  • Bin 3: (25, inf)

We can easily do that using the cut command. Let’s start:

df<-df%>%mutate(MySpecificBins = cut(MyContinuous, breaks = c(-Inf,15,25,Inf)))
head(df,10)
 

Let’s have a look at the counts of each bin.

df%>%group_by(MySpecificBins)%>%count()
 

Notice that you can define also you own labels within the cut function.


Create Bins based on Quantiles

Let’s say that you want each bin to have the same number of observations, like for example 4 bins of an equal number of observations, i.e. 25% each. We can easily do it as follows:

numbers_of_bins = 4

df<-df%>%mutate(MyQuantileBins = cut(MyContinuous, 
                                 breaks = unique(quantile(MyContinuous,probs=seq.int(0,1, by=1/numbers_of_bins))), 
                                                 include.lowest=TRUE))

head(df,10)
 

We can check the MyQuantileBins if contain the same number of observations, and also to look at their ranges:

df%>%group_by(MyQuantileBins)%>%count()
 

Notice that in case that you want to split your continuous variable into bins of equal size you can also use the ntile function of the dplyr package, but it does not create labels of the bins based on the ranges.

Want to Build Bins in Python?

Do you want to create bins in Python? You can have a look at our post!

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

6 thoughts on “How to Convert Continuous variables into Categorical by Creating Bins”

  1. Hi
    Thanks for the post
    I am getting an error when I tried to use this command

    Error: Problem with `mutate()` input `gest_bins`.
    x missing value where TRUE/FALSE needed
    ℹ Input `gest_bins` is `cut(gest = cut(-Inf, 28, 35, Inf))`.
    Run `rlang::last_error()` to see where the error occurred.

    Is this due to missing values and in that case, how to solve that?

    Reply
  2. The issue is how best to decide the number of bins and their width. Check out how cartographers have handled the same issue when choropleth mapping in a brilliant paper:
    The Selection of Class Intervals
    Author(s): Ian S. Evans
    Transactions of the Institute of British Geographers, New Series, Vol. 2, No. 1,Contemporary Cartography (1977), http://www.jstor.org/stable/622195 .

    Reply

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s