2017-06-23 71 views
1

我正在爬行观光和活动页面以获取该页面中提到的价格。在下面的块提到的价格:使用beautifulsoup从<span>标记中正确地获取数据,python

<div class="price-info" data-origin-price="1200" data-lowest-price="1200.0" data-origin-ccy="JPY" data-discount-percentage="60"> 
     <span class="before-discount-row"> 
     <span class="before-discount">25.12</span> 
     <span class="currency">EUR</span> 
     </span> 
    <span class="price-row"> 
     <span class="price-prefix">From</span> 
     <span class="price">10.05</span> 
     <span class="currency">EUR</span> 
    </span> 

我可以得到日元(1200)的价格。在下一步中,我还希望以欧元收回价格。特别是在下面的子块价格:

<span class="price-row"> 
    <span class="price-prefix">From</span> 
    <span class="price">10.05</span> 
    <span class="currency">EUR</span> 
</span> 

但不知何故,我跑进墙壁。这里是我的代码:

import requests 
from bs4 import BeautifulSoup 
import sys 
import urllib 

user_agent = {'User-agent': 'Chrome/43.0.2357.124'} 

RegionID = "tokyo" 

r = requests.get("https://www.govoyagin.com/things_to_do/japan/" + str(RegionID) + "?page=0" + str(page)) 
soup = BeautifulSoup(r.content) 

g_data = soup.find_all("li", {"class": "activity-list"}) 
for item in g_data: 
     prices = item.find_all("div", {"class": "price-info"}) 
     for t in price: 
      Price_final = item.find_all("span", {"class": "price"}) 
      print(Price_final) 

这是我得到的,而不是10.05欧元

[<span class="price"></span>] 

有人可以帮我输出?有没有什么办法可以让数字超出范围?

感谢您的帮助:)

+1

内容是动态的,这是个问题。 –

+0

@ElvirMuslic有什么办法可以解决这个问题吗? –

+0

是的,使用“假/自动/模拟浏览器”Selenium,在YouTube上查找它。 –

回答

1

我想你忘记了最后一个for循环:

g_data = soup.find_all("li", {"class": "activity-list"}) 
for item in g_data: 
     prices = item.find_all("div", {"class": "price-info"}) 
     for t in prices: 
      final_prices = item.find_all("span", {"class": "price"}) 
      for p in final_prices: 
       print(p) 
+0

不幸的是它不起作用,仍然得到相同的输出。 –

相关问题