2012-11-12 95 views
0

我目前正试图通过包含许多Facebook聊天片段的文本文件进行解析。片段存储如下: -使用Python 2.x解析JSON文件

{"t":"msg","c":"p_100002239013747","s":14,"ms":[{"msg":{"text":"2what is the best restauran 
t in hong kong? ","time":1303115825598,"clientTime":1303115824391,"msgID":"1862585188"},"from":10000 
2239013747,"to":635527479,"from_name":"David Robinson","from_first_name":"David","from_gender":1,"to_name":"Jason Yeung","to_first_name":"Jason","to_gender":2,"type":"msg"}]} 

我试过很多方法来解析/打开JSON文件,但无济于事。以下是我已经试过thusfar: -

import json 

data = [] 
with open("C:\\Users\\Me\\Desktop\\facebookchat.txt", 'r') as json_string: 
    for line in json_string: 
     data.append(json.loads(line)) 

错误:

Traceback (most recent call last): 
    File "C:/Users/Amy/Desktop/facebookparser.py", line 6, in <module> 
    data.append(json.loads(line)) 
    File "C:\Program Files\Python27\lib\json\__init__.py", line 326, in loads 
    return _default_decoder.decode(s) 
    File "C:\Program Files\Python27\lib\json\decoder.py", line 366, in decode 
    obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 
    File "C:\Program Files\Python27\lib\json\decoder.py", line 382, in raw_decode 
    obj, end = self.scan_once(s, idx) 
ValueError: Invalid control character at: line 1 column 91 (char 91) 

也:

import json 

with open("C:\\Users\\Me\\Desktop\\facebookchat.txt", 'r') as json_file: 
    data = json.load(json_file) 

...但我得到完全相同的错误如上。

有什么建议吗?我在这里搜索了以前的帖子,并尝试了其他解决方案,但无济于事。我知道我需要把它当作一个字典文件,例如,'时间'是一个关键字,'1303115825598'是各自的时间值,但如果我甚至无法将json文件处理到内存中,我就没有办法可以解析它。

我哪里错了?谢谢

回答

3

您的数据包含换行符,JSON不允许这些换行符。你必须到线再缝合在一起回:

data = [] 
with open("C:\\Users\\Me\\Desktop\\facebookchat.txt", 'r') as json_string: 
    partial = '' 
    for line in json_string: 
     partial += line.rstrip('\n') 
     try: 
      data.append(json.loads(partial)) 
      partial = '' 
     except ValueError: 
      continue # Not yet a complete JSON value 

的代码行收集到partial,但减去换行,并尝试将JSON解码。如果成功,则partial将再次设置为空字符串以处理下一个条目。如果失败,我们循环到下一行进行追加,直到一个完整的JSON值进行解码。

+0

Thanks Martijn。我从互联网上复制了JSON摘录,因此格式错误。我已经将文件制作成一个连续的字符串,现在可以正确读取它。再次感谢 – thefragileomen