2014-07-15 86 views
8

我想让我的脚湿与BS。 我试图通过文档工作,但我遇到的第一步已经是一个问题。BeautifulSoup响应错误

这是我的代码:

from bs4 import BeautifulSoup 
soup = BeautifulSoup('https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5....1b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description') 

print(soup.prettify()) 

这是响应我得到:

Warning (from warnings module): 
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/bs4/__init__.py", line 189 
'"%s" looks like a URL. Beautiful Soup is not an HTTP client. You should probably use an  
HTTP client to get the document behind the URL, and feed that document to Beautiful Soup.' % markup) 
UserWarning: "https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5...b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description" 
looks like a URL. Beautiful Soup is not an HTTP client. You should 
probably use an HTTP client to get the document behind the URL, and feed that document  
to Beautiful Soup. 
https://api.flickr.com/services/rest/?method=flickr.photos.search&api;_key=5...b&per;_page=250&accuracy;=1&has;_geo=1&extras;=geo,tags,views,description 

是不是因为我尝试访问http计划** S **还是其他问题? 感谢您的帮助!

+0

保存网页本地然后在该文件中使用汤。 – suspectus

回答

10

您正在将URL作为字符串传递。相反,你需要通过urllib2requests获得页面的源代码:

from urllib2 import urlopen # for Python 3: from urllib.request import urlopen 
from bs4 import BeautifulSoup 

URL = 'https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5....1b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description' 
soup = BeautifulSoup(urlopen(URL)) 

注意,你不需要调用read()上的urlopen()结果,BeautifulSoup允许第一个参数是一个类文件对象,urlopen()返回一个类文件对象。

2

错误说明了一切,你传递一个URL到美丽的汤。您需要先获取网站内容,然后才能将内容传递给BS。

要下载的内容,您可以使用urlib2

import urllib2 
response = urllib2.urlopen('http://www.example.com/') 
html = response.read() 

后来

soup = BeautifulSoup(html) 
+0

嘿,我试图弄清楚哪个答案是第一个发布的,它表示两个答案都是“29分钟前回答”。所以我认为我喜欢一个,接受另一个。我不知道如何正确。我想接受第一个答案。 – Stophface