Predictive Hacks

How to work with JSON cells in Pandas

Assume that you are dealing with a pandas data frame where one of your columns is in a JSON format and you want to extract specific information. For this example, we will work with the doc_report.csv dataset from Kaggle

import pandas as pd
import ast

pd.set_option("max_colwidth", 180)

doc = pd.read_csv("doc_reports.csv", index_col=0)

# print the properties column

If we look at the data, the properties field is in JSON format. This means that we need to convert it to a dictionary and then extract the required information. We will work with the ast library to convert it to a dictionary and then we will create separate columns for each key as follows:

dummy = doc['properties'].apply(lambda x: ast.literal_eval(x))

doc['gender'] = dummy.apply(lambda x:x.get('gender'))
doc['nationality'] = dummy.apply(lambda x:x.get('nationality'))
doc['document_type'] = dummy.apply(lambda x:x.get('document_type'))
doc['date_of_expiry'] = dummy.apply(lambda x:x.get('date_of_expiry'))
doc['issuing_country'] = dummy.apply(lambda x:x.get('issuing_country'))

# lets get the columns
doc[['gender', 'nationality', 'document_type', 'date_of_expiry','issuing_country' ]]

As we can see, we converted a JSON data type cell to columns based on the key values

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore


Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.


Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s