Predictive Hacks

Anti-Joins with Pandas

It is straightforward to join pandas data frames (inner join, left join, right join, outer join, and cross join), but the anti joins can be a little tricky. Below, we provide a hack of how you can easily perform anti-joins in pandas. The graphs below help us to recall the different types of joins.

anti Join SQL | Oracle Anti Join - By Microsoft Awarded MVP - In 30Sec|  wikitechy - oracle tutorial - sql tutorial

Is there a right_anti when joining in PySpark? - Stack Overflow

The hack of the anti-joins is to do an outer join and to add the indicator column. Let’s provide a hands-on example.

import pandas as pd

df1 = pd.DataFrame({"key": ["K0", "K1", "K2", "K3"],
                     "A": ["A0", "A1", "A2", "A3"],
                     "B": ["B0", "B1", "B2", "B3"]})


df2 = pd.DataFrame({"key": ["K0", "K1", "K5", "K6"],
                     "C": ["C0", "C1", "C5", "C6"],
                     "D": ["D0", "D1", "D5", "D6"]})


df = pd.merge(df1,df2, how='outer', left_on='key', right_on='key', indicator = True)

df

Notice that we outer joined the two data frames, df1 and df2 and we added an extra column called _merge. We generated this extra column by setting the indicator = True that adds a column to the output DataFrame called “_merge” with information on the source of each row. The column can be given a different name by providing a string argument. The column will have a Categorical type with the value of “left_only” for observations whose merge key only appears in the left DataFrame, “right_only” for observations whose merge key only appears in the right DataFrame, and “both” if the observation’s merge key is found in both DataFrames.

Now if we want to run an anti-join we need get the rows where the _merge is not equal to “both”.

df.loc[df['_merge']!='both']

Similarly, by following the same logic, we can easily get the left and right anti-join.

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s