我得到一个错误,当我运行此脚本:NameError:名称 '的htmlText' 没有定义
import urllib.request
import urllib.parse
from bs4 import BeautifulSoup
url = "http://nytimes.com,http://nytimes.com"
urls = [url] #stack of urls to scrape
visited = [url] #historic record of urls
while len(urls) >0:
try:
htmltext = urllib.request.urlopen(urls[0]).read()
except:
print(htmltext)
原素文字:
import urllib.request
import urllib.parse
from bs4 import BeautifulSoup
url = "http://nytimes.com,http://nytimes.com"
urls = [url] #stack of urls to scrape
visited = [url] #historic record of urls
while len(urls) >0:
try:
htmltext = urllib.request.urlopen(urls[0]).read()
except:
print(urls[0])
soup = BeautifulSoup(htmltext)
urls.pop(0)
print (soup.findAll('a',href=True))
错误:
socket.gaierror: [Errno -2] Name or service not known
urllib.error.URLError: urlopen error [Errno -2] Name or service not known
Traceback (most recent call last):
NameError: name 'htmltext' is not defined
那么如果你把'http://nytimes.com,http:// nytimes.com'放到你的浏览器地址栏中会发生什么?此外,您的标题与描述不匹配(但*当然*'htmltext'没有在'except'情况下定义 - 您在那里是因为任务*失败*)。 – jonrsharpe 2014-10-26 18:53:40
我不知道它如何可能,但现在工作,对不起 – gaia 2014-10-26 19:06:52
我明白为什么它的工作原理,我从“url”值中删除了第二个地址,在连接请求期间可能发生冲突,因为它被加倍了? – gaia 2014-10-26 20:13:36