0
我试图下载使用urllib.request.urlopen从页面的PDF文件,但它会返回错误:“名单”对象有没有属性“超时”名单'对象有没有属性“超时”
def get_hansard_data(page_url):
#Read base_url into Beautiful soup Object
html = urllib.request.urlopen(page_url).read()
soup = BeautifulSoup(html, "html.parser")
#grab <div class="itemContainer"> that hold links and dates to all hansard pdfs
hansard_menu = soup.find_all("div","itemContainer")
#Get all hansards
#write to a tsv file
with open("hansards.tsv","a") as f:
fieldnames = ("date","hansard_url")
output = csv.writer(f, delimiter="\t")
for div in hansard_menu:
hansard_link = [HANSARD_URL + div.a["href"]]
hansard_date = div.find("h3", "catItemTitle").string
#download
with urllib.request.urlopen(hansard_link) as response:
data = response.read()
r = open("/Users/Parliament Hansards/"+hansard_date +".txt","wb")
r.write(data)
r.close()
print(hansard_date)
print(hansard_link)
output.writerow([hansard_date,hansard_link])
print ("Done Writing File")
请帮忙。
错误发生在哪条线上?你使用的是什么版本的Python? –
你可以给我们堆栈跟踪。您发布的代码不包含超时呼叫,因此很难找到 – jpopesculian
版本:Python 3.4。此行发生错误:使用urllib.request.urlopen(hansard_link)作为响应: –