Predictive Hacks

How to Rename and Relevel Factors in R

relevel factors

A “special” data structure in R is the “factors”. We are going to provide some examples of how we can rename and relevel the factors. For the next examples, we will work with the following data

df<-data.frame(ID=c(1:10), Gender=factor(c("M","M","M","","F","F","M","","F","F" )), 
           AgeGroup=factor(c("[60+]", "[26-35]", "[NA]", "[36-45]", "[46-60]", "[26-35]", "[NA]", "[18-25]", "[26-35]", "[26-35]")))
> df
   ID Gender AgeGroup
1   1      M    [60+]
2   2      M  [26-35]
3   3      M     [NA]
4   4         [36-45]
5   5      F  [46-60]
6   6      F  [26-35]
7   7      M     [NA]
8   8         [18-25]
9   9      F  [26-35]
10 10      F  [26-35]

Rename Factors

Let’s say that I want to convert the empty string of Gender to “U” from the Unknown

levels(df$Gender)[levels(df$Gender)==""] ="U"

Let’s say that we want to merge the age groups. For instance the new categories will be “[18-35]”, “[35+], “[NA]”

levels(df$AgeGroup)[levels(df$AgeGroup)=="[18-25]"] = "[18-35]"
levels(df$AgeGroup)[levels(df$AgeGroup)=="[26-35]"] = "[18-35]"

levels(df$AgeGroup)[levels(df$AgeGroup)=="[36-45]"] = "[35+]"
levels(df$AgeGroup)[levels(df$AgeGroup)=="[46-60]"] = "[35+]"
levels(df$AgeGroup)[levels(df$AgeGroup)=="[60+]"] = "[35+]"


Notice that we could have done it in once, but it is very risky because sometimes we can have different order than what we expected.

levels(df$AgeGroup)<-c("[18-35]","[18-35]","[35+]","[35+]","[35+]", "[NA]")

By applying the changed we mentioned before, we get the following data.

> df
   ID Gender AgeGroup
1   1      M    [35+]
2   2      M  [18-35]
3   3      M     [NA]
4   4      U    [35+]
5   5      F    [35+]
6   6      F  [18-35]
7   7      M     [NA]
8   8      U  [18-35]
9   9      F  [18-35]
10 10      F  [18-35]

Relevel Factors

Let’s say that we want the “[NA]” age group to appear first

df$AgeGroup<-factor(df$AgeGroup, c("[NA]", "[18-35]" ,"[35+]"))

Another way to change the order is to use relevel() to make a particular level first in the list. (This will not work for ordered factors.). Let’s day that we want the ‘F’ Gender first

df$Gender<-relevel(df$Gender, "F")

By applying these changes, we can see how the factors have changed level.

> str(df)
'data.frame':	10 obs. of  3 variables:
 $ ID      : int  1 2 3 4 5 6 7 8 9 10
 $ Gender  : Factor w/ 3 levels "F","U","M": 3 3 3 2 1 1 3 2 1 1
 $ AgeGroup: Factor w/ 3 levels "[NA]","[18-35]",..: 3 2 1 3 3 2 1 2 2 2

More Data Science Hacks?

You can follow us on Medium for more Data Science Hacks

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

snowflake
Uncategorized

Get Started with Python UDFs in Snowflake

Finally, Snowflake supports UDF (user-define functions) in Python. Thank you Snowflake! Apart from Python, we can write UDFs in Java,