We are referring to the filter function of Pandas Dataframes. We have already mentioned it in our dplyr pipes in pandas post but now we will dive more into it to reveal some very useful hacks.
Let’s first create our Dataframe
df=pd.DataFrame(columns=['dummy'+str(i) for i in range(0,10)]+ ['Billy'+str(i) for i in range(0,10)]+ ['att'+str(i) for i in range(0,10)],data=None) for i in df.columns: df[i]=[0,0,0] df.columns
Index(['dummy0', 'dummy1', 'dummy2', 'dummy3', 'dummy4', 'Billy0', 'Billy1',
'Billy2', 'Billy3', 'Billy4', 'att0', 'att1', 'att2', 'att3', 'att4'],
dtype='object')
The like parameter
With this parameter, we can filter the columns were the input string is contained in their column names.
df.filter(like='Billy')
Billy0 Billy1 Billy2 Billy3 Billy4
0 0 0 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
The regex Parameter
This is the most useful thing about this function. You can filter the columns whatever you want using regular expressions. Here are some examples:
#filter the columns starting with att and followed by 1, 2 or 3 df.filter(regex='att[123]')
att1 att2 att3
0 0 0 0
1 0 0 0
2 0 0 0
#select all columns except one (Billy1) df.filter(regex="^(?!Billy1$)")
dummy0 dummy1 dummy2 dummy3 dummy4 Billy0 Billy2 Billy3 Billy4 \
0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
att0 att1 att2 att3 att4
0 0 0 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
#Select all except the columns that contains Billy df.filter(regex="^(?!Billy)")
dummy0 dummy1 dummy2 dummy3 dummy4 att0 att1 att2 att3 att4
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0