BeautifulSoup不能得到一切

2个星期前，我可以在这个网址的源代码读到的一切：http://camelcamelcamel.com/Jaybird-Sport-Wireless-Bluetooth-Headphones/product/B013HSW4SM?active=price_amazon BeautifulSoup不能得到一切

然而，今天，当我再次运行相同的代码，所有的历史价格可能不会出现在汤....你知道如何解决这个问题？

这里是我的Python代码（它的工作好！）

from bs4 import BeautifulSoup 
from urllib2 import urlopen 

url = 'http://camelcamelcamel.com/Jaybird-Sport-Wireless-Bluetooth-Headphones/product/B013HSW4SM?active=price_amazon' 
soup = BeautifulSoup(urlopen(url),'html.parser') 
lst = soup.find_all('tbody') 
for tbody in lst: 
    trs = tbody.find_all('tr') 
    for elem in trs: 
     tr_class = elem.get('class') 
     if tr_class != None: 
      if tr_class[0] == 'highest_price' or tr_class[0] == 'lowest_price': 
       tds = elem.find_all('td') 
       td_label = tds[0].get_text().split(' ')[0] 
       td_price = tds[1].get_text() 
       td_date = tds[2].get_text() 
       print td_label, td_price, td_date 
     else: 
      tds = elem.find_all('td') 
      td_label = tds[0].get_text().split(' ')[0] 
      if td_label == 'Average': 
       td_price = tds[1].get_text() 
       print td_label, td_price 

ps = soup.find_all('p') 
for p in ps: 
    p_class = p.get('class') 
    if p_class != None and len(p_class) == 2 and p_class[0] == 'smalltext' and p_class[1] == 'grey': 
     p_text = p.get_text() 
     m = re.search('since([\w\d,\s]+)\.', p_text) 
     if m: 
      date = m.group(1) 
      dt = datetime.datetime.strptime(date, ' %b %d, %Y') 
      print datetime.date.strftime(dt, '%Y-%m-%d') 
     break

来源

2015-12-12 Cherry Wu

我真的不知道有关解决方案，但一般应该避免这么多的列表索引和find_all条款。原因在于元素的位置或数量比class，id等变得容易得多。所以我会推荐使用css选择器。

来源

2015-12-12 23:07:39 kotrfa

从阅读源代码，它似乎是通过JavaScript访问历史价格数据。因此，您需要找到一种模拟真实浏览器的方式。就我个人而言，我使用Selenium来完成这些任务。

来源

2015-12-13 00:55:51 n1c9

BeautifulSoup不能得到一切

回答

相关问题