Replace Categorical Variables with Mode in R

In Data Science project, it is common to replace the missing values of the categorical variables with the mode. Let’s see the following example:

df<-data.frame(id=seq(1,10), ColumnA=c(10,9,8,7,NA,NA,20,15,12,NA), 


Note that the ColumnB and ColumnC are Character columns. Note also that there is no mode function in R. So let’s build it:

getmode <- function(v){
  uniqv <- unique(v)
  uniqv[which.max(tabulate(match(v, uniqv)))]

Now let’s replace all the empty strings of the Character variables with their corresponding column mode. Finally, we should convert the character variables to factors.

df[sapply(df, is.character)] <- lapply(df[sapply(df, is.character)], function(x) ifelse(x=="", getmode(x), x))
df[sapply(df, is.character)] <- lapply(df[sapply(df, is.character)], as.factor)

As we can see, we replaced the empty strings with the corresponding mode.

