Predictive Hacks

How to centralize a pandas data frame

For recommender systems and collaborative filters it is a good strategy to centralize your data around 0 by subtracting the mean value and then filling the NAs with 0. Depending on your dataset and what you want to do, the centralization can be by row or by column.

Centralize a pandas data frame by row

In this case, we want to subtract the row mean from each element in a row. Let’s see how we can do it with on line of code.

import pandas as pd
import numpy as np

df = pd.DataFrame({'ColA': [1, 2, 3], 'ColB': [4, 10, 12], 'ColC': [10, np.nan, 7]})
df

Centralize the data frame:

df_centralized = df.sub(df.mean(axis=1), axis=0)
df_centralized

Centralize a pandas data frame by column

Similarly, we can centralize it by subtracting the column mean for each element in a row.

df_centralized = df.sub(df.mean(axis=0), axis=1)
df_centralized

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s