2个星期前,我可以在这个网址的源代码读到的一切:http://camelcamelcamel.com/Jaybird-Sport-Wireless-Bluetooth-Headphones/product/B013HSW4SM?active=price_amazonBeautifulSoup不能得到一切
然而,今天,当我再次运行相同的代码,所有的历史价格可能不会出现在汤....你知道如何解决这个问题?
这里是我的Python代码(它的工作好!)
from bs4 import BeautifulSoup
from urllib2 import urlopen
url = 'http://camelcamelcamel.com/Jaybird-Sport-Wireless-Bluetooth-Headphones/product/B013HSW4SM?active=price_amazon'
soup = BeautifulSoup(urlopen(url),'html.parser')
lst = soup.find_all('tbody')
for tbody in lst:
trs = tbody.find_all('tr')
for elem in trs:
tr_class = elem.get('class')
if tr_class != None:
if tr_class[0] == 'highest_price' or tr_class[0] == 'lowest_price':
tds = elem.find_all('td')
td_label = tds[0].get_text().split(' ')[0]
td_price = tds[1].get_text()
td_date = tds[2].get_text()
print td_label, td_price, td_date
else:
tds = elem.find_all('td')
td_label = tds[0].get_text().split(' ')[0]
if td_label == 'Average':
td_price = tds[1].get_text()
print td_label, td_price
ps = soup.find_all('p')
for p in ps:
p_class = p.get('class')
if p_class != None and len(p_class) == 2 and p_class[0] == 'smalltext' and p_class[1] == 'grey':
p_text = p.get_text()
m = re.search('since([\w\d,\s]+)\.', p_text)
if m:
date = m.group(1)
dt = datetime.datetime.strptime(date, ' %b %d, %Y')
print datetime.date.strftime(dt, '%Y-%m-%d')
break