Predictive Hacks

How to Convert Continuous variables into Categorical by Creating Bins

bins

A very common task in data processing is the transformation of the numeric variables (continuous, discrete etc) to categorical by creating bins. For example, is quite ofter to convert the age to the age group. Let’s see how we can easily do that in R.

We will consider a random variable from the Poisson distribution with parameter λ=20

library(dplyr)
# Generate 1000 observations from the Poisson distribution 
# with lambda equal to 20
df<-data.frame(MyContinuous = rpois(1000,20))

# get the histogtam
hist(df$MyContinuous)
  
How to Convert Continuous variables into Categorical by Creating Bins 1

Create specific Bins

Let’s say that you want to create the following bins:

  • Bin 1: (-inf, 15]
  • Bin 2: (15,25]
  • Bin 3: (25, inf)

We can easily do that using the cut command. Let’s start:

df<-df%>%mutate(MySpecificBins = cut(MyContinuous, breaks = c(-Inf,15,25,Inf)))
head(df,10)
 
How to Convert Continuous variables into Categorical by Creating Bins 2

Let’s have a look at the counts of each bin.

df%>%group_by(MySpecificBins)%>%count()
 
How to Convert Continuous variables into Categorical by Creating Bins 3

Notice that you can define also you own labels within the cut function.


Create Bins based on Quantiles

Let’s say that you want each bin to have the same number of observations, like for example 4 bins of an equal number of observations, i.e. 25% each. We can easily do it as follows:

numbers_of_bins = 4

df<-df%>%mutate(MyQuantileBins = cut(MyContinuous, 
                                 breaks = unique(quantile(MyContinuous,probs=seq.int(0,1, by=1/numbers_of_bins))), 
                                                 include.lowest=TRUE))

head(df,10)
 
How to Convert Continuous variables into Categorical by Creating Bins 4

We can check the MyQuantileBins if contain the same number of observations, and also to look at their ranges:

df%>%group_by(MyQuantileBins)%>%count()
 
How to Convert Continuous variables into Categorical by Creating Bins 5

Notice that in case that you want to split your continuous variable into bins of equal size you can also use the ntile function of the dplyr package, but it does not create labels of the bins based on the ranges.

Want to Build Bins in Python?

Do you want to create bins in Python? You can have a look at our post!

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

6 thoughts on “How to Convert Continuous variables into Categorical by Creating Bins”

  1. Hi
    Thanks for the post
    I am getting an error when I tried to use this command

    Error: Problem with `mutate()` input `gest_bins`.
    x missing value where TRUE/FALSE needed
    ℹ Input `gest_bins` is `cut(gest = cut(-Inf, 28, 35, Inf))`.
    Run `rlang::last_error()` to see where the error occurred.

    Is this due to missing values and in that case, how to solve that?

    Reply
  2. The issue is how best to decide the number of bins and their width. Check out how cartographers have handled the same issue when choropleth mapping in a brilliant paper:
    The Selection of Class Intervals
    Author(s): Ian S. Evans
    Transactions of the Institute of British Geographers, New Series, Vol. 2, No. 1,Contemporary Cartography (1977), http://www.jstor.org/stable/622195 .

    Reply

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

transformers
Python

Text Summarization with Transformers

We have provided a walkthrough example of Text Summarization with Gensim. Today, we will provide an example of Text Summarization