2017-06-18 12 views
0

我在看this codegolf problem,并决定尝试采取python solution并使用urllib来代替。我修改some sample codeurllib操纵json无法解码堆栈交换API的unicode

import urllib.request 
import json 

res = urllib.request.urlopen('http://api.stackexchange.com/questions?sort=hot&site=codegolf') 
res_body = res.read() 

j = json.loads(res_body.decode("utf-8")) 

这给:

➜ codegolf python clickbait.py 
Traceback (most recent call last): 
    File "clickbait.py", line 7, in <module> 
    j = json.loads(res_body.decode("utf-8")) 
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte 

如果你去:http://api.stackexchange.com/questions?sort=hot&site=codegolf并点击 “头”,它说charset=utf-8。为什么它会给我这些奇怪的结果urlopen

回答

1

res_body被gzipped。我不确定解压缩回复是默认情况下urllib需要处理的内容。

如果您解压缩来自API服务器的响应,您将获得数据。

import urllib.request 
import zlib 
import json 

with urllib.request.urlopen(
    'http://api.stackexchange.com/questions?sort=hot&site=codegolf' 
    ) as res: 

    decompressed_data = zlib.decompress(res.read(), 16+zlib.MAX_WBITS) 
    j = json.loads(decompressed_data, encoding='utf-8') 

    print(j)