With PySpark, we can run the “case when” statement using the “when” method from the PySpark SQL functions. Assume that we have the following data frame:
and we want to create another column, called “flight_type” where:
- if time>300 then “Long”
- if time<200 then “Short”
- else “Medium”
Let’s see how we can do it with PySpark.
from pyspark.sql.functions import when df.withColumn('FlightType', when(df.ActualElapsedTime>300, "Long") .when(df.ActualElapsedTime<200, "Short") .otherwise("Medium")).show()
Voilà! We have created successfully the FlightType
column