Predictive Hacks

How to Rename and Relevel Factors in R

relevel factors

A “special” data structure in R is the “factors”. We are going to provide some examples of how we can rename and relevel the factors. For the next examples, we will work with the following data

df<-data.frame(ID=c(1:10), Gender=factor(c("M","M","M","","F","F","M","","F","F" )), 
           AgeGroup=factor(c("[60+]", "[26-35]", "[NA]", "[36-45]", "[46-60]", "[26-35]", "[NA]", "[18-25]", "[26-35]", "[26-35]")))
> df
   ID Gender AgeGroup
1   1      M    [60+]
2   2      M  [26-35]
3   3      M     [NA]
4   4         [36-45]
5   5      F  [46-60]
6   6      F  [26-35]
7   7      M     [NA]
8   8         [18-25]
9   9      F  [26-35]
10 10      F  [26-35]

Rename Factors

Let’s say that I want to convert the empty string of Gender to “U” from the Unknown

levels(df$Gender)[levels(df$Gender)==""] ="U"

Let’s say that we want to merge the age groups. For instance the new categories will be “[18-35]”, “[35+], “[NA]”

levels(df$AgeGroup)[levels(df$AgeGroup)=="[18-25]"] = "[18-35]"
levels(df$AgeGroup)[levels(df$AgeGroup)=="[26-35]"] = "[18-35]"

levels(df$AgeGroup)[levels(df$AgeGroup)=="[36-45]"] = "[35+]"
levels(df$AgeGroup)[levels(df$AgeGroup)=="[46-60]"] = "[35+]"
levels(df$AgeGroup)[levels(df$AgeGroup)=="[60+]"] = "[35+]"


Notice that we could have done it in once, but it is very risky because sometimes we can have different order than what we expected.

levels(df$AgeGroup)<-c("[18-35]","[18-35]","[35+]","[35+]","[35+]", "[NA]")

By applying the changed we mentioned before, we get the following data.

> df
   ID Gender AgeGroup
1   1      M    [35+]
2   2      M  [18-35]
3   3      M     [NA]
4   4      U    [35+]
5   5      F    [35+]
6   6      F  [18-35]
7   7      M     [NA]
8   8      U  [18-35]
9   9      F  [18-35]
10 10      F  [18-35]

Relevel Factors

Let’s say that we want the “[NA]” age group to appear first

df$AgeGroup<-factor(df$AgeGroup, c("[NA]", "[18-35]" ,"[35+]"))

Another way to change the order is to use relevel() to make a particular level first in the list. (This will not work for ordered factors.). Let’s day that we want the ‘F’ Gender first

df$Gender<-relevel(df$Gender, "F")

By applying these changes, we can see how the factors have changed level.

> str(df)
'data.frame':	10 obs. of  3 variables:
 $ ID      : int  1 2 3 4 5 6 7 8 9 10
 $ Gender  : Factor w/ 3 levels "F","U","M": 3 3 3 2 1 1 3 2 1 1
 $ AgeGroup: Factor w/ 3 levels "[NA]","[18-35]",..: 3 2 1 3 3 2 1 2 2 2

How to Rename and Relevel Factors in R 1

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

data science journey
Miscellaneous

My Journey as a Data Science Blogger

Μy Background My Studies Back in 2001, I entered university to study Statistics. During my first year, I ran my