Predictive Hacks

How to Save & Read a Pandas Dataframe Containing Lists or Dictionaries

How to Save & Read a Pandas Dataframe Containing Lists and Dictionaries

There is a common problem when we have nested dictionaries or lists in our Pandas Dataframes. How can we save and read the file so we can get the dictionaries as dictionaries and not as strings? Can we transform a Dataframe already saved in the wrong way? We will show how to handle these issues in the following hacks.

The right way to Save and Read Pandas DataFrames with nested Dictionaries/Lists.

Most of us use the .to_csv() function of Pandas to save our data. This is the wrong way because it will save the dictionaries and lists as strings. The easiest way to deal with it is to use the function to_pickle(). Let’s see an example.

#dummy dataframe with lists and dictionaries as values

df=pd.DataFrame({'dictionary': {0: {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'},
  1: {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'},
  2: {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'},
  3: {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'},
  4: {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}},
 'list': {0: ['a', 'b', 'c', 'd', 'e'],
  1: ['a', 'b', 'c', 'd', 'e'],
  2: ['a', 'b', 'c', 'd', 'e'],
  3: ['a', 'b', 'c', 'd', 'e'],
  4: ['a', 'b', 'c', 'd', 'e']}})

	dictionary	                                list
0	{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}	[a, b, c, d, e]
1	{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}	[a, b, c, d, e]
2	{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}	[a, b, c, d, e]
3	{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}	[a, b, c, d, e]
4	{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}	[a, b, c, d, e]
#how to save
df.to_pickle("test")

#how to load
df=pd.read_pickle("test")

We can see now that our Dataframe has the dictionaries and lists in the right format.

type(df['dictionary'][0])

dict
How to Save Pandas Dataframe Containing Lists and Dictionaries

How to handle a Dataframe already saved in the wrong way.

Let’s say we get our data in a .csv file and we cant use pickle. The solution here is the ast library.

#Let's save our data in the worng way
df=pd.to_csv("test.csv")

#read the csv
df=pd.read_csv("test.csv")

#check the format of the dictionaries
zz["dictionary"][0]

"{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}"
#check the format of the lists
zz["list"][0]
"['a', 'b', 'c', 'd', 'e']"

We can see that it is a string instead of a list. We can change this as follows.

#Solution 
import ast

df['dictionary']=df['dictionary'].apply(lambda x: ast.literal_eval(x))
df['list']=df['list'].apply(lambda x: ast.literal_eval(x))

#check the format of the dictionaries
type(df["dictionary"][0])

dict
#check the format of the lists
type(df["list"][0])

list

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

data science journey
Miscellaneous

My Journey as a Data Science Blogger

Μy Background My Studies Back in 2001, I entered university to study Statistics. During my first year, I ran my