TransWikia.com

Need to get data from dictionaries into pandas, but the dict keys change

Stack Overflow Asked on November 22, 2021

I have thousands of dictionaries that I need to put into a single pandas data frame. The dictionaries look like this:

{'screen_width': 375,
 'city': 'London',
 'source': 'Mobile',
 'appVersion': '5.3.0',
 'connectionType': 'wifi',
 'sheetName': 'Regional Asset',
 '$device': 'iPhone',
 '$user_id': '[email protected]',
 '$device_id': '172fe47',
 '$os': 'iOS',
 '$manufacturer': 'Apple',
 '$os_version': '13.4.1',
 '$lib_version': '1.3.0',
 'distinct_id': '[email protected]',
 'fieldName': 'barcode',
 '$screen_height': 812,
 'mp_country_code': 'UK',
 '$model': 'iPhone12,3',
 'time': 1593404157
}

The problem I am having is that with each dictionary there might be an entry (such as city) missing from the dictionary, in which case the key isn’t there either. This is causing me massive problems.

What I’ve tried so far:

file = ('{0}.csv'.format(file_name))
    df = pd.read_json(file)
    df1 = pd.DataFrame(columns = [Column_names])
    for i in range(df.shape[0]):
            df1.loc[i] = [df.iloc[i,0]] + [df.iloc[i,1]['$screen_width']] + [df.iloc[i,1]['$city']] + [df.iloc[i,1]['source']] + [df.iloc[i,1]['connectionType']] 
            + [df.iloc[i,1]['sheetName']] + [df.iloc[i,1]['$device']] + [df.iloc[i,1]['$user_id']] + [df.iloc[i,1]['$device_id']] 
            + [df.iloc[i,1]['$os']] + [df.iloc[i,1]['mp_country_code']] + [df.iloc[i,1]['$manufacturer']] + [df.iloc[i,1]['$os_version']] + [df.iloc[i,1]['$lib_version']] 
            + [df.iloc[i,1]['distinct_id']] + [df.iloc[i,1]['$screen_height']]+ [df.iloc[i,1]['$model']] + [df.iloc[i,1]['$region']] 
            + [df.iloc[i,1]['mp_lib']] + [df.iloc[i,1]['time']] + [df.iloc[i,1]['mp_processing_time_ms']] + [df.iloc[i,1]['$browser']] + [df.iloc[i,1]['$insert_id']]

But as soon as it comes across a dictionary with city missing I get

KeyError: '$city'

I’ve also tried to add

try:
   enter code here
except (KeyError):
    pass

But that just returns an empty data frame.

Can anyone help?

Thanks

3 Answers

@Venkat J provided the list of dictionaries. You can pass this directly to the DataFrame constructor.

import pandas as pd

data = [
    {"col1": "10",
     "col2": "London",
     "col3": "Mobile",
     "col4": "Mobile"},
     {"col1": "20",
     "col2": "TOKYO",
     "col4": "Mobile",
     "col5": "Mobile"},
     {"col1": "30",
     "col2": "NewYork",
     "col3": "Mobile",
     "col4": "Mobile",
     "col5": "Mobile"}
]

pd.DataFrame(data)

Answered by jsmart on November 22, 2021

If you know all the possible columns that exists in your json file, you can define a dataFrame with all possible columns, and then load each dictionary into your dataFrame. The file sample.txt contains list of dictinories

[
{"col1": "10",
 "col2": "London",
 "col3": "Mobile",
 "col4": "Mobile"},
 {"col1": "20",
 "col2": "TOKYO",
 "col4": "Mobile",
 "col5": "Mobile"},
 {"col1": "30",
 "col2": "NewYork",
 "col3": "Mobile",
 "col4": "Mobile",
 "col5": "Mobile"}
 ]

Program:

import pandas as pd
import json

if __name__ == "__main__":
    result= pd.DataFrame(columns=['col1', 'col2', 'col3','col4','col5'])
    f= open('sample.txt', 'r')
    raw_data = json.loads(f.read())
    for i in raw_data:
        result = result.append(i, ignore_index=True)
        print(i)
    print(result) 

The output of the program is:

  col1     col2    col3    col4    col5
0   10   London  Mobile  Mobile     NaN
1   20    TOKYO     NaN  Mobile  Mobile
2   30  NewYork  Mobile  Mobile  Mobile

Answered by Venkat J on November 22, 2021

import pandas as pd

    
my_dict  ={'screen_width': 375,
 'city': 'London',
 'source': 'Mobile',
 'appVersion': '5.3.0',
 'connectionType': 'wifi',
 'sheetName': 'Regional Asset',
 '$device': 'iPhone',
 '$user_id': '[email protected]',
 '$device_id': '172fe47',
 '$os': 'iOS',
 '$manufacturer': 'Apple',
 '$os_version': '13.4.1',
 '$lib_version': '1.3.0',
 'distinct_id': '[email protected]',
 'fieldName': 'barcode',
 '$screen_height': 812,
 'mp_country_code': 'UK',
 '$model': 'iPhone12,3',
 'time': 1593404157
}

my_dict_2 = {'screen_width': 375,
 'source': 'Mobile',
 'appVersion': '5.3.0',
 'connectionType': 'wifi',
 'sheetName': 'Regional Asset',
 '$device': 'iPhone',
 '$user_id': '[email protected]',
 '$device_id': '172fe47',
 '$os': 'iOS',
 '$manufacturer': 'Apple',
 '$os_version': '13.4.1',
 '$lib_version': '1.3.0',
 'distinct_id': '[email protected]',
 'fieldName': 'barcode',
 '$screen_height': 812,
 'mp_country_code': 'UK',
 '$model': 'iPhone12,3',
 'time': 1593404157}

my_dict_3 = {'screen_width': 375,
 'city': 'London',
 'source': 'Mobile',
 'appVersion': '5.3.0',
 'connectionType': 'wifi',
 '$device': 'iPhone',
 '$user_id': '[email protected]',
 '$device_id': '172fe47',
 '$os': 'iOS',
 '$manufacturer': 'Apple',
 '$os_version': '13.4.1',
 '$lib_version': '1.3.0',
 'distinct_id': '[email protected]',
 'fieldName': 'barcode',
 '$screen_height': 812,
 'mp_country_code': 'UK',
 '$model': 'iPhone12,3',
 'time': 1593404157}

list_of_dictionaries = [my_dict, my_dict_2, my_dict_3]

start = True


for my_dict in list_of_dictionaries:
    if start:
        my_df = pd.DataFrame.from_dict([my_dict])
        start = False
    else:
        my_df = pd.concat([my_df, pd.DataFrame.from_dict([my_dict])])```
    
    

Answered by rab1262 on November 22, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP