Predictive Hacks

The case when statement in PySpark

With PySpark, we can run the “case when” statement using the “when” method from the PySpark SQL functions. Assume that we have the following data frame:

and we want to create another column, called “flight_type” where:

  • if time>300 then “Long”
  • if time<200 then “Short”
  • else “Medium”

Let’s see how we can do it with PySpark.

from pyspark.sql.functions import when
df.withColumn('FlightType', 
           when(df.ActualElapsedTime>300, "Long")
           .when(df.ActualElapsedTime<200, "Short")
           .otherwise("Medium")).show()
 

Voilà! We have created successfully the FlightType column

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s