2016-11-14 193 views
3

我很好奇,怎么用熊猫阅读以下结构的嵌套JSON:熊猫阅读嵌套JSON

{ 
    "number": "", 
    "date": "01.10.2016", 
    "name": "R 3932", 
    "locations": [ 
     { 
      "depTimeDiffMin": "0", 
      "name": "Spital am Pyhrn Bahnhof", 
      "arrTime": "", 
      "depTime": "06:32", 
      "platform": "2", 
      "stationIdx": "0", 
      "arrTimeDiffMin": "", 
      "track": "R 3932" 
     }, 
     { 
      "depTimeDiffMin": "0", 
      "name": "Windischgarsten Bahnhof", 
      "arrTime": "06:37", 
      "depTime": "06:40", 
      "platform": "2", 
      "stationIdx": "1", 
      "arrTimeDiffMin": "1", 
      "track": "" 
     }, 
     { 
      "depTimeDiffMin": "", 
      "name": "Linz/Donau Hbf", 
      "arrTime": "08:24", 
      "depTime": "", 
      "platform": "1A-B", 
      "stationIdx": "22", 
      "arrTimeDiffMin": "1", 
      "track": "" 
     } 
    ] 
} 

在这里,这保持了数组作为JSON。我宁愿将它扩展到列。

pd.read_json("/myJson.json", orient='records') 

编辑

感谢您的第一个答案。 我应该优化我的问题: 在数组中嵌套属性的展平不是必需的。 只需将[A,B,C]连接df.locations ['name']即可。

我的文件包含多个JSON对象(每行1个)我想保留数字,日期,名称和位置列。不过,我需要加入这些地点。

allLocations = "" 
isFirst = True 
for location in result.locations: 
    if isFirst: 
     isFirst = False 
     allLocations = location['name'] 
    else: 
     allLocations += "; " + location['name'] 
allLocations 

我在这里的做法似乎不是有效/熊猫风格。

+0

给予好评的ÖBB –

回答

9

您可以使用json_normalize

import json 
from pandas.io.json import json_normalize  

with open('myJson.json') as data_file:  
    data = json.load(data_file) 

df = json_normalize(data, 'locations', ['date', 'number', 'name'], 
        record_prefix='locations_') 
print (df) 
    locations_arrTime locations_arrTimeDiffMin locations_depTime \ 
0              06:32 
1    06:37      1    06:40 
2    08:24      1      

    locations_depTimeDiffMin   locations_name locations_platform \ 
0      0 Spital am Pyhrn Bahnhof     2 
1      0 Windischgarsten Bahnhof     2 
2         Linz/Donau Hbf    1A-B 

    locations_stationIdx locations_track number name  date 
0     0   R 3932   R 3932 01.10.2016 
1     1       R 3932 01.10.2016 
2     22       R 3932 01.10.2016 

编辑:

您可以使用read_json与解析由DataFrame构造name和最后groupby与应用join

df = pd.read_json("myJson.json") 
df.locations = pd.DataFrame(df.locations.values.tolist())['name'] 
df = df.groupby(['date','name','number'])['locations'].apply(','.join).reset_index() 
print (df) 
     date name number           locations 
0 2016-01-10 R 3932   Spital am Pyhrn Bahnhof,Windischgarsten Bahnho... 
+0

和json会是原始文件?或文件路径? –

+0

在文档中它是'Unserialized JSON objects',但我用dict测试它。 – jezrael

+1

我添加了阅读文件,请检查它。 – jezrael