There is a common problem when we have nested dictionaries or lists in our Pandas Dataframes. How can we save and read the file so we can get the dictionaries as dictionaries and not as strings? Can we transform a Dataframe already saved in the wrong way? We will show how to handle these issues in the following hacks.
The right way to Save and Read Pandas DataFrames with nested Dictionaries/Lists.
Most of us use the .to_csv() function of Pandas to save our data. This is the wrong way because it will save the dictionaries and lists as strings. The easiest way to deal with it is to use the function to_pickle(). Let’s see an example.
#dummy dataframe with lists and dictionaries as values df=pd.DataFrame({'dictionary': {0: {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}, 1: {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}, 2: {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}, 3: {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}, 4: {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}}, 'list': {0: ['a', 'b', 'c', 'd', 'e'], 1: ['a', 'b', 'c', 'd', 'e'], 2: ['a', 'b', 'c', 'd', 'e'], 3: ['a', 'b', 'c', 'd', 'e'], 4: ['a', 'b', 'c', 'd', 'e']}})
dictionary list
0 {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'} [a, b, c, d, e]
1 {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'} [a, b, c, d, e]
2 {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'} [a, b, c, d, e]
3 {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'} [a, b, c, d, e]
4 {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'} [a, b, c, d, e]
#how to save df.to_pickle("test") #how to load df=pd.read_pickle("test")
We can see now that our Dataframe has the dictionaries and lists in the right format.
type(df['dictionary'][0])
dict
How to handle a Dataframe already saved in the wrong way.
Let’s say we get our data in a .csv file and we cant use pickle. The solution here is the ast library.
#Let's save our data in the worng way df=pd.to_csv("test.csv") #read the csv df=pd.read_csv("test.csv") #check the format of the dictionaries zz["dictionary"][0]
"{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}"
#check the format of the lists zz["list"][0]
"['a', 'b', 'c', 'd', 'e']"
We can see that it is a string instead of a list. We can change this as follows.
#Solution import ast df['dictionary']=df['dictionary'].apply(lambda x: ast.literal_eval(x)) df['list']=df['list'].apply(lambda x: ast.literal_eval(x))
#check the format of the dictionaries type(df["dictionary"][0])
dict
#check the format of the lists type(df["list"][0])
list