2014-02-26 59 views
0

我有一个文件与以下行从?转换为csv?

{"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f508f-e7c8-32b8-e044-0003ba298018","municipalityCode":"0766","municipalityName":"Hedensted","streetCode":"0072","streetName":"Værnegården","streetBuildingIdentifier":"13","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"","postCodeIdentifier":"8000","districtName":"Århus","presentationString":"Værnegården 13, 8000 Århus","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(553564 6179299)","x":553564,"y":6179299}]} 

我想每一行转变成与一个头CSV可读的文件。如下所示

status,message,data,addressAccessId,municipalityCode,municipalityName,streetCode,streetName,streetBuildingIdentifier,mailDeliverySublocationIdentifier,districtSubDivisionIdentifier,postCodeIdentifier,districtName,presentationString,addressSpecificCount,validCoordinates,geometryWkt,x,y 
OK,OK,data:type,addressAccessType,0a3f508f-e7c8-32b8-e044-0003ba298018,0766,Hedensted,0072,Værnegården,13,,,8000,Århus,Værnegården 13, 8000 Århus,1,true,POINT553564 6179299,553564,6179299 

我该如何做到这一点?代码和解释是非常受欢迎的。到目前为止,这是我想出了从这个例子:(How can I convert JSON to CSV?以下)

x = json.loads(x) 

f = csv.writer(open('test.csv', 'wb+')) 

# Write CSV Header, If you dont need that, remove this line 
f.writerow(['status', 'message', 'type', 'addressAccessId', 'municipalityCode','municipalityName','streetCode','streetName','streetBuildingIdentifier','mailDeliverySublocationIdentifier','districtSubDivisionIdentifier','postCodeIdentifier','districtName','presentationString','addressSpecificCount','validCoordinates','geometryWkt','x','y']) 


for x in x: 
    f.writerow([x['status'], 
       x['message'], 
       x['data']['type'], 
       x['data']['addressAccessId'], 
       x['data']['municipalityCode'], 
       x['data']['municipalityName'], 
       x['data']['streetCode'], 
       x['data']['streetName'], 
       x['data']['streetBuildingIdentifier'], 
       x['data']['mailDeliverySublocationIdentifier'], 
       x['data']['districtSubDivisionIdentifier'], 
       x['data']['postCodeIdentifier'], 
       x['data']['districtName'], 
       x['data']['presentationString'], 
       x['data']['addressSpecificCount'], 
       x['data']['validCoordinates'], 
       x['data']['geometryWkt'], 
       x['data']['x'], 
       x['data']['y']]) 

我已经通过看和尝试了很多其他的解决方案,包括DictWriter的,更换()和翻译()删除但是还没有能够改变我的需求。目的是能够选择输出到新文件中的字段,并将x和y转换为新的坐标系。但现在我只是试图解析上面的行到一个CSV文件。任何人都可以提供他们的代码的代码和解释?非常感谢您的宝贵时间。

下面是我addresses.txt

{"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f5081-e039-32b8-e044-0003ba298018","municipalityCode":"0265","municipalityName":"Roskilde","streetCode":"0831","streetName":"Brønsager","streetBuildingIdentifier":"69","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"Svogerslev","postCodeIdentifier":"4000","districtName":"Roskilde","presentationString":"Brønsager 69, 4000 Roskilde","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(690026 6169309)","x":690026,"y":6169309}]} 
    {"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f5089-ecab-32b8-e044-0003ba298018","municipalityCode":"0461","municipalityName":"Odense","streetCode":"9505","streetName":"Vægtens Kvarter","streetBuildingIdentifier":"271","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"Holluf Pile","postCodeIdentifier":"5220","districtName":"Odense SØ","presentationString":"Vægtens Kvarter 271, 5220 Odense SØ","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(592191 6135829)","x":592191,"y":6135829}]} 
    {"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f507c-adc3-32b8-e044-0003ba298018","municipalityCode":"0165","municipalityName":"Albertslund","streetCode":"0445","streetName":"Skyttehusene","streetBuildingIdentifier":"33","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"","postCodeIdentifier":"2620","districtName":"Albertslund","presentationString":"Skyttehusene 33, 2620 Albertslund","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(711079 6174741)","x":711079,"y":6174741}]} 
    {"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f509c-7f57-32b8-e044-0003ba298018","municipalityCode":"0851","municipalityName":"Aalborg","streetCode":"5205","streetName":"Løvstikkevej","streetBuildingIdentifier":"36","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"","postCodeIdentifier":"9000","districtName":"Aalborg","presentationString":"Løvstikkevej 36, 9000 Aalborg","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(552407 6322490)","x":552407,"y":6322490}]} 
    {"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f5098-32a6-32b8-e044-0003ba298018","municipalityCode":"0779","municipalityName":"Skive","streetCode":"0462","streetName":"Landevejen","streetBuildingIdentifier":"52","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"Håsum","postCodeIdentifier":"7860","districtName":"Spøttrup","presentationString":"Landevejen 52, 7860 Spøttrup","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(491515 6269739)","x":491515,"y":6269739}]} 
+1

当你的列表也被称为'x'时,我不会重复迭代变量'x'。 –

回答

3

注意的前几行的data关键持有字典的列表x['data']['type']将不起作用,但x['data'][0]['type']。但是,该列表中可能有多个这样的字典。我假设你想要一个CSV行x['data']字典

接下来,看起来您在每行上都有一个UTF-8 BOM ;无论写什么,都没有正确使用UTF-8编码。我们需要去掉这个标记,前3个字符。

最后,JSON字符串始终是Unicode数据,并且数据中包含非ASCII字符,因此在将数据传递给CSV对象之前,必须再次编码为字节串。

我会用csv.DictWriter这里,有一个字段名称预先定义的列表:

import codecs 
import csv 
import json 

fields = [ 
    'status', 'message', 'type', 'addressAccessId', 'municipalityCode', 
    'municipalityName', 'streetCode', 'streetName', 'streetBuildingIdentifier', 
    'mailDeliverySublocationIdentifier', 'districtSubDivisionIdentifier', 
    'postCodeIdentifier', 'districtName', 'presentationString', 'addressSpecificCount', 
    'validCoordinates', 'geometryWkt', 'x', 'y'] 


with open('test.csv', 'wb') as csvfile, open('jsonfile', 'r') as jsonfile: 
    writer = csv.DictWriter(csvfile, fields) 
    writer.writeheader() 

    for line in jsonfile: 
     if line.startswith(codecs.BOM_UTF8): 
      line = line[3:] 
     entry = json.loads(line) 
     for item in entry['data']: 
      row = dict(item, status=entry['status'], message=entry['message']) 
      row = {k.encode('utf8'): unicode(v).encode('utf8') for k, v in row.iteritems()} 
      writer.writerow(row) 

row字典基本上是每个在entry['data']列表字典的副本,与statusmessage密钥分别复制。这使得row是一个平面字典。

我也一行一行读取你的输入文件,就像你说每行包含一个单独的JSON条目一样。

+0

你想把'writer.writerow(row)'放在'for'循环中吗? – colcarroll

+0

非常感谢您的详细解答,它绝对有很大的帮助。假设我有一个包含多行的文件,我想要的数据在'x [data]'中。然而,当我尝试你的代码时,我得到以下错误:ValueError:没有JSON对象可以被解码是因为包含我的json-lines的文件,或者它可能是因为行是无效的json? – Philip

+0

@JLLagrange:的确如此。 –

0

使用cvs.DictWriter()打开输出文件并按照您的指定定义输出标题字段。使用extrasaction ='ignore'和restval =''作为选项。

看看Opening A large JSON file in Python with no newlines for csv conversion Python 2.6.6帮助处理大文件,因为我有一个类似的问题也看看我链接到的问题。

我使用适当的循环从JSON构建类似类型的系统。

例如,

def parse_row(currdata): 
    outx = {} 
    # currdata is defined earlier to point to the x['data'] dictionary 
    for eachx in currdata: 
    outx[eachx] = currdata[eachx] 
    return outx 

其中这与currdata作为自变量的函数,并要求具有x [“数据”] [行]作为输入参数。

rows = len(x['data']) 
for row in range(rows): 
    outx = parse_row(x['data'][row]) 
    # process the row and create output 

这应该让你正确设置解析。我不能将实际的代码复制到这个答案中,但这应该指向一个解决方案。