2017-08-10 106 views
0

首次发布!我将JSON数据(字典)从服务器转换为csv文件。除了嵌套“宇航员”(这是一个阵列)之外,所采用的键和值都很好。基本上每个单独的JSON字符串都是一个数据,可以包含从0到无限数量的宇航员,这些特征我想要作为独立值提取。例如这样的事情:从巢中获取JSON嵌套数组中的键和值

  • Astronaut1_Spaceships_First:Katabom
  • Astronaut1_Spaceships_Second:海怪
  • Astronaut1_name:Jebeddia
  • (...)
  • Astronaut2_gender:希望女性

和等等。这里的问题是,巢被设置为一个数组而不是字典,所以我不知道该怎么做。我已经尝试了dpath库以及奉承巢,但没有任何改变。有任何想法吗?

import json 
import os 
import csv 
import datetime 
import dpath.util #Dpath library needs to be installed first 

datum = {"Mission": "Make Earth Greater Again", "Objective": "Prove Earth is flat", "Astronauts": [{"Spaceships": {"First": "Katabom", "Second": "The Kraken"}, "Name": "Jebeddiah", "Gender": "Hopefully male", "Age": 35, "Prefered colleages": [], "Following missions": [{"Payment_status": "TO BE CONFIRMED"}]}, {"Spaceships": {"First": "The Kraken", "Second": "Minnus I"}, "Name": "Bob", "Gender": "Hopefully female", "Age": 23, "Prefered colleages": [], "Following missions": [{"Payment_status": "TO BE CONFIRMED"}]}]} 

#Parsing process 
     parsed = json.loads(datum) #datum is the JSON string retrieved from the server 

def flattenjson(parsed, delim): 
    val = {} 
    for i in parsed.keys(): 
     if isinstance(parsed[i], dict): 
      get = flattenjson(parsed[i], delim) 
      for j in get.keys(): 
       val[i + delim + j] = get[j] 
     else: 
     val[i] = parsed[i] 

    return val 
flattened = flattenjson(parsed,"__") 

#process of creating csv file 
keys=['Astronaut1_Spaceship_First','Astronaut2_Spaceship_Second', 'Astronaut1_Name] #reduced to 3 keys for this example 

writer = csv.DictWriter(OD, keys ,restval='Null', delimiter=",", quotechar="\"", quoting=csv.QUOTE_ALL, dialect= "excel") 
     writer.writerow(flattened) 

#JSON DATA FROM SERVER 
{ 
"Mission": "Make Earth Greater Again", 
"Objective": "Prove Earth is flat", 
"Astronauts": [ { 
    "Spaceships": { 
    "First": "Katabom", 
    "Second": "The Kraken" 
    }, 
    "Name": "Jebeddiah", 
    "Gender": "Hopefully male", 
    "Age": 35, 
    "Prefered colleages": [], 
    "Following missions": [ 
    { 
     "Payment_status": "TO BE CONFIRMED" 
    } 
    ] 
}, 
{ 
    "Spaceships": { 
    "First": "The Kraken", 
    "Second": "Minnus I" 
    }, 
    "Name": "Bob", 
    "Gender": "Hopefully female", 
    "Age": 23, 
    "Prefered colleages": [], 
    "Following missions": [ 
    { 
     "Payment_status": "TO BE CONFIRMED" 
    } 
    ] 
}, 
    ] 
} 
] 

回答

0

首先,这里定义的数据不是从服务器中提取的数据。来自服务器的数据将是一个字符串。你在这个程序中的数据已经被处理了。现在,假设数据为:

datum = '{"Mission": "Make Earth Greater Again", "Objective": "Prove Earth is flat", "Astronauts": [{"Spaceships": {"First": "Katabom", "Second": "The Kraken"}, "Name": "Jebeddiah", "Gender": "Hopefully male", "Age": 35, "Prefered colleages": [], "Following missions": [{"Payment_status": "TO BE CONFIRMED"}]}, {"Spaceships": {"First": "The Kraken", "Second": "Minnus I"}, "Name": "Bob", "Gender": "Hopefully female", "Age": 23, "Prefered colleages": [], "Following missions": [{"Payment_status": "TO BE CONFIRMED"}]}]}' 

您不需要dpath库。这里的问题是你的json flattener不处理嵌入式列表。尝试使用我在下面提到的那个。 假设你要一行csv文件,

import json 
def flattenjson(data, delim, topname=''): 
    """JSON flattener that can handle embedded lists and dictionaries""" 
    flattened = {} 
    def internalflat(int_data, name=topname): 
     if type(int_data) is dict: 
      for key in int_data: 
       internalflat(int_data[key], name + key + delim) 
     elif type(int_data) is list: 
      i = 1 
      for elem in int_data: 
       internalflat(elem, name + str(i) + delim) 
       i += 1 
     else: 
      flattened[name[:-len(delim)]] = int_data 
    internalflat(data) 
    return flattened 
#If you don't want mission or objective in csv file 
flattened_astronauts = flattenjson(json.loads(datum)["Astronauts"], "__", "Astronaut") 
keys = flattened_astronauts.keys().sort() 
writer = csv.DictWriter(OD, keys ,restval='Null', delimiter=",", quotechar="\"", quoting=csv.QUOTE_ALL, dialect= "excel") 
writer.writerow(flattened_astronauts) 
+0

试了几次后我只得到了同样的错误: flattened_astronauts = flattenjson({json.loads(基准) “宇航员”]}) 类型错误:unhashable键入:'list' 基本上“宇航员”没有被编码为字典,并且在那个函数中没有改变...... – Saphiron

+0

我的不好。只是编辑功能,以更好地适应您的要求,并删除花括号(他们不是必需的,输出已经是一本字典)。 –

+0

工程!非常感谢! 现在的问题是宇航员的数量取决于基准。因此,无论何时数字发生变化都会生成标题(函数:writer.writeheader())。有没有什么办法可以设置修正头文件(宇航员在数据中的最大数量是25),并据此写入csv文件? – Saphiron