2017-07-23 53 views
-1

我在文件中有大约200,000个imdb_id,并且想要使用omdb API从这些imdb_id获取JSON信息。什么是使用python从omdb获取电影信息的最快方法?

我写了这个代码,它工作正常,但它的速度很慢(3秒每个ID,则需166小时):

import urllib.request 
import csv 
import datetime 
from collections import defaultdict 


i = 0 
columns = defaultdict(list) 
with open('a.csv', encoding='utf-8') as f: 
    reader = csv.DictReader(f) 
    for row in reader: 
     for (k, v) in row.items(): 
      columns[k].append(v) 
with open('a.csv', 'r', encoding='utf-8') as csvinput: 
    with open('b.csv', 'w', encoding='utf-8', newline='') as csvoutput: 
     writer = csv.writer(csvoutput) 
     for row in csv.reader(csvinput): 
      if row[0] == "item_id": 
       writer.writerow(row + ["movie_info"]) 
      else: 
       url = urllib.request.urlopen(
        "http://www.omdbapi.com/?i=tt" + str(columns['item_id'][i]) + "&apikey=??????").read() 
       url = url.decode('utf-8') 
       writer.writerow((row + [url])) 
       i = i + 1 

请告诉我以最快的方式从蟒蛇omdb获取电影信息???

**编辑:我写了这个代码,并获得1022网址resopnse后,我哈瓦这个错误:

import grequests 

urls = open("a.csv").readlines() 
api_key = '??????' 


def exception_handler(request, exception): 
    print("Request failed") 


# read file and put each lines to an LIST 
for i in range(len(urls)): 
    urls[i] = "http://www.omdbapi.com/?i=tt" + str(urls[i]).rstrip('\n') + "&apikey=" + api_key 
requests = (grequests.get(u) for u in urls) 
responses = grequests.map(requests, exception_handler=exception_handler) 
with open('b.json', 'wb') as outfile: 
    for response in responses: 
     outfile.write(response.content) 

错误是:

Traceback (most recent call last): 
    File "C:/python_apps/omdb_async.py", line 18, in <module> 
    outfile.write(response.content) 
AttributeError: 'NoneType' object has no attribute 'content' 

我怎样才能解决这个错误???

回答

2

此代码是IO绑定的,并且会从使用Python的异步/等待功能中受益匪浅。您可以遍历您的URL集合,为每个请求创建一个异步执行的请求,就像this SO question中的示例一样。

一旦您异步提出这些请求,您可能需要将您的请求速率限制在OMDB API限制范围内。

相关问题