我在学习Python,今天的情况是从网页下载文本。 此代码工作正常:使用urllib2无法获取网址
import urllib2
from bs4 import BeautifulSoup
base_url = "http://www.pracuj.pl"
url = urllib2.urlopen(base_url+"/praca/big%20data;kw").read()
soup = BeautifulSoup(url,"html.parser")
for k in soup.find_all('a'):
if "offer__list_item_link_name" in k['class']:
link = base_url+k['href']
print link
所以它打印这样所有链接:
http://www.pracuj.pl/praca/inzynier-big-data-cloud-computing-knowledge-discovery-warszawa,oferta,4212875
http://www.pracuj.pl/praca/data-systems-administrator-krakow,oferta,4204109
http://www.pracuj.pl/praca/programista-java-sql-python-w-zespole-bigdata-krakow,oferta,4204341
http://www.pracuj.pl/praca/program-challenging-projektowanie-i-tworzenie-oprogramowania-katowice,oferta,4186995
http://www.pracuj.pl/praca/program-challenging-analizy-predyktywne-warszawa,oferta,4187512
http://www.pracuj.pl/praca/software-engineer-r-language-krakow,oferta,4239818
当添加一行来分配新的地址,获取每个内容行:
url2 = urllib2.urlopen(link).read()
我收到一个错误:
Traceback (most recent call last):
File "download_page.py", line 10, in <module>
url2 = urllib2.urlopen(link).read()
NameError: name 'link' is not defined
什么是想知道,它不工作只在for
循环。当我在循环之外添加相同的线时,它工作。
你能指出我做错了什么吗?
帕维尔