It is straightforward to join pandas data frames (inner join, left join, right join, outer join, and cross join), but the anti joins can be a little tricky. Below, we provide a hack of how you can easily perform anti-joins in pandas. The graphs below help us to recall the different types of joins.
The hack of the anti-joins is to do an outer join and to add the indicator column. Let’s provide a hands-on example.
import pandas as pd df1 = pd.DataFrame({"key": ["K0", "K1", "K2", "K3"], "A": ["A0", "A1", "A2", "A3"], "B": ["B0", "B1", "B2", "B3"]}) df2 = pd.DataFrame({"key": ["K0", "K1", "K5", "K6"], "C": ["C0", "C1", "C5", "C6"], "D": ["D0", "D1", "D5", "D6"]}) df = pd.merge(df1,df2, how='outer', left_on='key', right_on='key', indicator = True) df
Notice that we outer joined the two data frames, df1 and df2 and we added an extra column called _merge. We generated this extra column by setting the indicator = True that adds a column to the output DataFrame called “_merge” with information on the source of each row. The column can be given a different name by providing a string argument. The column will have a Categorical type with the value of “left_only” for observations whose merge key only appears in the left DataFrame, “right_only” for observations whose merge key only appears in the right DataFrame, and “both” if the observation’s merge key is found in both DataFrames.
Now if we want to run an anti-join we need get the rows where the _merge is not equal to “both”.
df.loc[df['_merge']!='both']
Similarly, by following the same logic, we can easily get the left and right anti-join.