Predictive Hacks

Pandas: How to assign values based on multiple conditions of different columns

pandas

When we are dealing with Data Frames, it is quite common, mainly for feature engineering tasks, to change the values of the existing features or to create new features based on some conditions of other columns. Here, we will provide some examples of how we can create a new column based on multiple conditions of existing columns. For these examples, we will work with the titanic dataset.

Import the data and the libraries

import pandas as pd
import numpy as np

url = 'https://gist.githubusercontent.com/michhar/2dfd2de0d4f8727f873422c5d959fff5/raw/ff414a1bcfcba32481e4d4e8db578e55872a2ca1/titanic.csv'
df = pd.read_csv(url, sep="\t")
df.head()
 
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th…female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS

Define the Task

Let’s say that we want to create a new column (or to update an existing one) with the following conditions:

  • If the Age is NaN and Pclass=1 then the Age=40
  • If the Age is NaN and Pclass=2 then the Age=30
  • If the Age is NaN and Pclass=3 then the Age=25
  • Else the Age will remain as is

Solution 1: Using apply and lambda functions

We will need to create a function with the conditions.

def myfunc(age, pclass):
    if pd.isnull(age) and pclass==1:
        age=40
    elif pd.isnull(age) and pclass==2:
        age=30
    elif pd.isnull(age) and pclass==3:
        age=25
    else:
        age=age
    return age
 

Then, we use the apply method using the lambda function which takes as input our function with parameters the pandas columns. Do not forget to set the axis=1, in order to apply the function row-wise.

df['NewColumn_1'] = df.apply(lambda x: myfunc(x['Age'], x['Pclass']), axis=1)

Solution 2: Using NumPy Select

We can use the NumPy Select function, where you define the conditions and their corresponding values. For example:

conditions = [df['Pclass'].eq(1) & df['Age'].isnull(),
              df['Pclass'].eq(2) & df['Age'].isnull(),
              df['Pclass'].eq(3) & df['Age'].isnull()]

choices = [40,30,25]

df['NewColumn_2'] = np.select(conditions, choices, default= df['Age'] )
 

Now let’s see if the Column_1 is identical to Column_2

df['NewColumn_1'].equals(df['NewColumn_2'])

True

Let’s have a look also at our new data frame focusing on the cases where the Age was NaN.

df[df['Age'].isnull()].head()
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedNewColumn_1NewColumn_2
5603Moran, Mr. JamesmaleNaN003308778.4583NaNQ25.025.0
171812Williams, Mr. Charles EugenemaleNaN0024437313.0000NaNS30.030.0
192013Masselmani, Mrs. FatimafemaleNaN0026497.2250NaNC25.025.0
262703Emir, Mr. Farred ChehabmaleNaN0026317.2250NaNC25.025.0
282913O’Dwyer, Miss. Ellen “Nellie”femaleNaN003309597.8792NaNQ25.025.0

As we can see, we got the expected output! In case you want to work with R you can find have a look at the example.

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

1 thought on “Pandas: How to assign values based on multiple conditions of different columns”

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

How to Save & Read a Pandas Dataframe Containing Lists and Dictionaries
Python

Pandas GroupBy Tips

This post is a short tutorial in Pandas GroupBy. As always we will work with examples. Let’s create a dummy