Predictive Hacks

Hack: The “count(case when … else … end)” in dplyr

count case when

When I run quires in SQL (or even HiveQL, Spark SQL and so on), it is quite common to use the syntax of count(case when.. else ... end). Today, I will provide you an example of how you run this type of commands in dplyr.

Let’s start:


df<-data.frame(id = 1:10,
               gender = c("m","m","m","f","f","f","m","f","f","f"),
               amt= c(5,20,30,10,20,50,5,20,10,30))

Hack: The "count(case when ... else ... end)" in dplyr 1

Let’s get the count and the sum per gender in different columns in SQL.

sqldf("select count(case when gender='m' then id else null end) as male_cnt,
              count(case when gender='f' then id else null end) as female_cnt,
              sum(case when gender='m' then amt else 0 end) as male_amt,
              sum(case when gender='f' then amt else 0 end) as female_amt
              from df")


  male_cnt female_cnt male_amt female_amt
1        4          6       60        140

Let’s get the same output in dplyr. We will need to subset the data frame based on one column.



  male_cnt female_cnt male_amt female_amt
1        4          6       60        140

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

2 thoughts on “Hack: The “count(case when … else … end)” in dplyr”

  1. The awkwardness with the R version mainly comes from requiring the result to be in an awkward form. A more usual way of achieving an equivalent result in a better form would be to use

    df %>%
    group_by(gender) %>%
    summarise(n = n(), amt = sum(amt), .groups = ‘drop’)

    • The output is in a different format. You can use the tidyR package to reshape the results. However, our goal here was to show the equivalent of “case when” in dplyr


Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore