2013-01-23 34 views
2

我有一个beautifulSoup安装一个一般的Python路径,另一个在virtualenv中的virtualenv:BeautifulSoup的同一版本返回不同的结果

beautifulsoup4 - 4.1.3  - active # in general Python installation 

beautifulsoup4 - 4.1.3  - active # in virtualenv path 

我在两个环境中运行下面的代码

import urllib2 
import unicodedata 
from bs4 import BeautifulSoup 
from collections import Counter 
soup = BeautifulSoup(urllib2.urlopen('http://www.thehindu.com/news/cities/bangalore/aero-india-takes-off-on-february-6/article4329776.ece').fp) 

在一般的Python安装中,它给了我

>>> soup.select('.article-text .body') 
[<p class="body"> It is that time when aviation buffs get ready to take off to the Air Force Station in Yelahanka here when the ninth edition of Aero India will be inaugurated by Defence Minister A.K. Antony on February 6.</p>, <p class="body">They can watch aerobatics by, among others, the Flying Bulls from the Czech Republic and Russian Knights — the Russian Air Force Aerobatic Team will complement Indian Air Force’s Sarang Aerobatic Team — at the biennial event that provides a platform for Indian and foreign vendors.</p>, <p class="body">However, IAF’s pride — the Surya Kiran Aerobatic Tea — which has performed to huge plaudits from the audience in the previous shows, will not be there for the country’s premier air show, a press release said.</p>, <p class="body">All exhibition space has been sold out and this edition is expected to see the participation of over 600 companies and 768 overseas delegations. </p>, <p class="body">The largest overseas participation is from the U.S. followed by Israel and Russia. The other major participants include France, the U.K., Germany and Belgium, Bulgaria, Italy, Ukraine, Australia, Belarus, Czech Republic, Japan, Norway, South Africa, Spain, Switzerland, Austria, Brazil, Canada, The Netherlands, Romania, Sweden, Singapore and the UAE.</p>, <p class="body">Organised by the Department of Defence Production, the five-day show aims at promoting products and services being offered by the Indian Defence industry in the international market.</p>] 
>>> 

而在virtualenv中的环境中,它说明不了什么

>>> soup.select('.article-text .body') 
[] 

是什么原因造成这个问题?我如何在虚拟环境中修复它?

+0

如果代码是相同的,BS版本是相同的,我假设文件是​​相同的..你需要找到什么是不同的。你没有给我们足够的信息来帮助你做到这一点。也许试试这个小文件样本,你可以在这里发布,也给我们每个环境的规格。 –

+1

您是否检查过两个'urllib2.urlopen'调用是否返回相同的内容? – dm03514

+1

@ dm03514好点!是的,我刚刚意识到你没有使用URL开启者,并且一些站点根据用户代理返回不同结构化的HTML,每个环境可能使用不同的用户代理。 –

回答

0

我刚刚面对同样的问题。为我工作的解决方案是明确指出解析器。在我的情况下,这是: soup = BeautifulSoup(markup, "html5lib")