Predictive Hacks

Pandas: How to assign values based on multiple conditions of different columns

When we are dealing with Data Frames, it is quite common, mainly for feature engineering tasks, to change the values of the existing features or to create new features based on some conditions of other columns. Here, we will provide some examples of how we can create a new column based on multiple conditions of existing columns. For these examples, we will work with the titanic dataset.

Import the data and the libraries

import pandas as pd
import numpy as np

url = 'https://gist.githubusercontent.com/michhar/2dfd2de0d4f8727f873422c5d959fff5/raw/ff414a1bcfcba32481e4d4e8db578e55872a2ca1/titanic.csv'



Let’s say that we want to create a new column (or to update an existing one) with the following conditions:

• If the Age is NaN and Pclass=1 then the Age=40
• If the Age is NaN and Pclass=2 then the Age=30
• If the Age is NaN and Pclass=3 then the Age=25
• Else the Age will remain as is

Solution 1: Using apply and lambda functions

We will need to create a function with the conditions.

def myfunc(age, pclass):
if pd.isnull(age) and pclass==1:
age=40
elif pd.isnull(age) and pclass==2:
age=30
elif pd.isnull(age) and pclass==3:
age=25
else:
age=age
return age


Then, we use the apply method using the lambda function which takes as input our function with parameters the pandas columns. Do not forget to set the axis=1, in order to apply the function row-wise.

df['NewColumn_1'] = df.apply(lambda x: myfunc(x['Age'], x['Pclass']), axis=1)

Solution 2: Using NumPy Select

We can use the NumPy Select function, where you define the conditions and their corresponding values. For example:

conditions = [df['Pclass'].eq(1) & df['Age'].isnull(),
df['Pclass'].eq(2) & df['Age'].isnull(),
df['Pclass'].eq(3) & df['Age'].isnull()]

choices = [40,30,25]

df['NewColumn_2'] = np.select(conditions, choices, default= df['Age'] )


Now let’s see if the Column_1 is identical to Column_2

df['NewColumn_1'].equals(df['NewColumn_2'])

True

Let’s have a look also at our new data frame focusing on the cases where the Age was NaN.

df[df['Age'].isnull()].head()

As we can see, we got the expected output! In case you want to work with R you can find have a look at the example.

Get updates and learn from the best

Python

Estimate Probabilities of Card Games

We are going to show how we can estimate card probabilities by applying Monte Carlo Simulation and how we can

Python

Monte Carlo Integration in Python

We will provide examples of how you solve integrals numerically in Python. Let’s recall from statistics that the mean value