2016-12-14 227 views
3

我有一个复杂的JSON文件,看起来像这样:Python的大熊猫 - JSON来数据帧

{ 
    "User A" : { 
    "Obj1" : { 
     "key1": "val1", 
     "key2": "val2", 
     "key3": "val3", 
    } 
    "Obj2" : { 
     "key1": "val1", 
     "key2": "val2", 
     "key3": "val3" 
    } 
    } 
    "User B" : { 
    "Obj1" : { 
     "key1": "val1", 
     "key2": "val2", 
     "key3": "val3", 
     "key4": "val4" 
    } 
    } 
} 

而且我希望把它变成一个数据帧,看起来像这样:

   key1 key2 key3 key4 
User A Obj1 val1 val2 val3 NaN 
     Obj2 val1 val2 val3 NaN 
User B Obj1 val1 val2 val3 val4 

这是大熊猫可能吗?如果是这样,我该如何设法做到这一点?

  • 如果更简单,我不介意删除用户和Obj的前两列,只保留在键的列。

回答

2

你可以先读文件到dict

with open('file.json') as data_file:  
    dd = json.load(data_file) 

print(dd) 
{'User B': {'Obj1': {'key2': 'val2', 'key4': 'val4', 'key1': 'val1', 'key3': 'val3'}}, 
'User A': {'Obj1': {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'}, 
'Obj2': {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'}}} 

然后用dict comprehensionconcat

df = pd.concat({key:pd.DataFrame(dd[key]).T for key in dd.keys()}) 
print (df) 
      key1 key2 key3 key4 
User A Obj1 val1 val2 val3 NaN 
     Obj2 val1 val2 val3 NaN 
User B Obj1 val1 val2 val3 val4 

另一种解决方案与read_json,但首先需要通过unstack重塑和删除NaN行通过dropna。最后需要DataFrame.from_records

df = pd.read_json('file.json').unstack().dropna() 
print (df) 
User A Obj1  {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'} 
     Obj2  {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'} 
User B Obj1 {'key2': 'val2', 'key4': 'val4', 'key1': 'val1... 
dtype: object 

df1 = pd.DataFrame.from_records(df.values.tolist()) 
print (df1) 
    key1 key2 key3 key4 
0 val1 val2 val3 NaN 
1 val1 val2 val3 NaN 
2 val1 val2 val3 val4 

df1 = pd.DataFrame.from_records(df.values.tolist(), index = df.index) 
print (df1) 
      key1 key2 key3 key4 
User A Obj1 val1 val2 val3 NaN 
     Obj2 val1 val2 val3 NaN 
User B Obj1 val1 val2 val3 val4 
+0

你是如此的帮助谢谢!无法想象我的工作了一小时的东西,可以用两行代码,这么优雅...有没有一种简单的方法来保存这个DF作为一个Excel文件? – TheDaJon

+0

谢谢你的接受!当然,使用['to_excel'](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html) - ''df1.to_excel('file.xlsx')'或' df1.to_excel('file.xlsx',index = False)'如果需要删除索引。 – jezrael