2017-10-08 22 views
1

I am trying to load in python the file business.json from yelp academic data available for their academic challenge, see below (https://www.yelp.com/dataset/documentation/json) My Goal is to extract all restaurant and their ID to then find the one restaurant I am interested for. Once I have this restaurant id, I want to load review.json and extract all reviews for that given restaurant. Sadly I am stuck at the initial stage of landing the .json负荷大JCON文件 - 错误= JSONDecodeError:额外的数据

这是business.json的样子:

{ 
    // string, 22 character unique string business id 
    "business_id": "tnhfDv5Il8EaGSXZGiuQGg", 

    // string, the business's name 
    "name": "Garaje", 

    // string, the neighborhood's name 
    "neighborhood": "SoMa", 

    // string, the full address of the business 
    "address": "475 3rd St", 

    // string, the city 
    "city": "San Francisco", 

    // string, 2 character state code, if applicable 
    "state": "CA", 

    // string, the postal code 
    "postal code": "94107", 

    // float, latitude 
    "latitude": 37.7817529521, 

    // float, longitude 
    "longitude": -122.39612197, 

    // float, star rating, rounded to half-stars 
    "stars": 4.5, 

    // interger, number of reviews 
    "review_count": 1198, 

    // integer, 0 or 1 for closed or open, respectively 
    "is_open": 1, 

    // object, business attributes to values. note: some attribute values might be objects 
    "attributes": { 
     "RestaurantsTakeOut": true, 
     "BusinessParking": { 
      "garage": false, 
      "street": true, 
      "validated": false, 
      "lot": false, 
      "valet": false 
     }, 
    }, 

    // an array of strings of business categories 
    "categories": [ 
     "Mexican", 
     "Burgers", 
     "Gastropubs" 
    ], 

    // an object of key day to value hours, hours are using a 24hr clock 
    "hours": { 
     "Monday": "10:00-21:00", 
     "Tuesday": "10:00-21:00", 
     "Friday": "10:00-21:00", 
     "Wednesday": "10:00-21:00", 
     "Thursday": "10:00-21:00", 
     "Sunday": "11:00-18:00", 
     "Saturday": "10:00-21:00" 
    } 
} 

当我尝试导入business.json用下面的代码:

import json 

jsonBus = json.loads(open('business.json').read()) 
for item in jsonBus: 
    name = item.get("Name") 
    businessID = item.get("business_id") 

我得到以下错误:

runfile('/Users/Nico/Google Drive/Python/yelp/yelp_academic.py', wdir='/Users/Nico/Google Drive/Python/yelp') 
Traceback (most recent call last): 

    File "<ipython-input-46-68ba9d6458bc>", line 1, in <module> 
    runfile('/Users/Nico/Google Drive/Python/yelp/yelp_academic.py', wdir='/Users/Nico/Google Drive/Python/yelp') 

    File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile 
    execfile(filename, namespace) 

    File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile 
    exec(compile(f.read(), filename, 'exec'), namespace) 

    File "/Users/Nico/Google Drive/Python/yelp/yelp_academic.py", line 3, in <module> 
    jsonBus = json.loads(open('business.json').read()) 

    File "/anaconda3/lib/python3.6/json/__init__.py", line 354, in loads 
    return _default_decoder.decode(s) 

    File "/anaconda3/lib/python3.6/json/decoder.py", line 342, in decode 
    raise JSONDecodeError("Extra data", s, end) 

JSONDecodeError: Extra data 

有谁知道为什么会出现这样的错误?

我也开放给任何更聪明的方式继续!

最佳,

尼科

回答

1

如果你的JSON文件是你所提到的完全一样,它不应该有任何意见(又名// string, 22 character unique string business id),因为它们不是标准的一部分。

请参阅相关的一个帖子在这里:Can comments be used in JSON?

+1

这是来自yelp网站的复制粘贴,我认为它不在json – Nico

+0

我使用相同的数据集,并且得到相同的错误OP。文件中的* only *'//'替换为“aka”([它是在地名中](https://i.stack.imgur.com/N7cmi.jpg))。 JSON看起来是合法的,否则没有评论。 [这是SublimeText中JSON的屏幕截图](https://i.stack.imgur.com/A2cja.jpg)。 OP显示它的方式就是来自该链接。这实际上并不是数据在文件中的布局。 – BruceWayne

+0

通过使用'json_data = json.loads('business.json')'我得到几乎相同的错误,它是从无\ n引发JSONDecodeError(“Expecting value”,s,err.value)\ n json.decoder.JSONDecodeError:期望值:第1行第1列(char 0)' – BruceWayne

0

我想这样的作品 - 我与同一数据集工作,也有类似的错误。看到似乎工作的评论here

import json 

js = [json.loads(line) for line in open('business.json')] 
for item in js: 
    name = item.get("name") 
    businessID = item.get("business_id") 

但是,我仍然想知道为什么json.loads()不起作用。该文件本身看起来很好。

+1

'json.loads()'加载一个字符串,而不是一个文件,并且期望该文件是一个完整的JSON对象。这个文件改为在每一行上包含一个JSON对象 –

+0

@ cricket_007 - Ooohhhhh好吧 - 我是json的新手(显然是P)并没有意识到这一点。谢谢你的提示! – BruceWayne