Predictive Hacks

# How to Rename and Relevel Factors in R

A “special” data structure in R is the “factors”. We are going to provide some examples of how we can rename and relevel the factors. For the next examples, we will work with the following data

df<-data.frame(ID=c(1:10), Gender=factor(c("M","M","M","","F","F","M","","F","F" )),
AgeGroup=factor(c("[60+]", "[26-35]", "[NA]", "[36-45]", "[46-60]", "[26-35]", "[NA]", "[18-25]", "[26-35]", "[26-35]")))

> df
ID Gender AgeGroup
1   1      M    [60+]
2   2      M  [26-35]
3   3      M     [NA]
4   4         [36-45]
5   5      F  [46-60]
6   6      F  [26-35]
7   7      M     [NA]
8   8         [18-25]
9   9      F  [26-35]
10 10      F  [26-35]

## Rename Factors

Let’s say that I want to convert the empty string of Gender to “U” from the Unknown

levels(df$Gender)[levels(df$Gender)==""] ="U"


Let’s say that we want to merge the age groups. For instance the new categories will be “[18-35]”, “[35+], “[NA]”

levels(df$AgeGroup)[levels(df$AgeGroup)=="[18-25]"] = "[18-35]"
levels(df$AgeGroup)[levels(df$AgeGroup)=="[26-35]"] = "[18-35]"

levels(df$AgeGroup)[levels(df$AgeGroup)=="[36-45]"] = "[35+]"
levels(df$AgeGroup)[levels(df$AgeGroup)=="[46-60]"] = "[35+]"
levels(df$AgeGroup)[levels(df$AgeGroup)=="[60+]"] = "[35+]"



Notice that we could have done it in once, but it is very risky because sometimes we can have different order than what we expected.

levels(df$AgeGroup)<-c("[18-35]","[18-35]","[35+]","[35+]","[35+]", "[NA]")  By applying the changed we mentioned before, we get the following data. > df ID Gender AgeGroup 1 1 M [35+] 2 2 M [18-35] 3 3 M [NA] 4 4 U [35+] 5 5 F [35+] 6 6 F [18-35] 7 7 M [NA] 8 8 U [18-35] 9 9 F [18-35] 10 10 F [18-35] ## Relevel Factors Let’s say that we want the “[NA]” age group to appear first df$AgeGroup<-factor(df$AgeGroup, c("[NA]", "[18-35]" ,"[35+]"))  Another way to change the order is to use relevel() to make a particular level first in the list. (This will not work for ordered factors.). Let’s day that we want the ‘F’ Gender first df$Gender<-relevel(df$Gender, "F")  By applying these changes, we can see how the factors have changed level. > str(df) 'data.frame': 10 obs. of 3 variables:$ ID      : int  1 2 3 4 5 6 7 8 9 10
$Gender : Factor w/ 3 levels "F","U","M": 3 3 3 2 1 1 3 2 1 1$ AgeGroup: Factor w/ 3 levels "[NA]","[18-35]",..: 3 2 1 3 3 2 1 2 2 2

### Get updates and learn from the best

#### More To Explore

Python

What is Market Basket Analysis Intuitively, we could say that the Market Basket Analysis is given a database of customer

Python

#### Item-Based Collaborative Filtering in Python

In another post, we explained how we can easily apply advanced Recommender Systems. In this post we will provide an