我试图使用BeautifulSoup
来刮取以下页面(例如1,2)以获取从曼谷的一个地方到另一个地方的行动列表。BeautifulSoup获取给定标签后的所有链接
基本上,我可以查询并选择旅行的描述如下。
url = 'http://www.transitbangkok.com/showBestRoute.php?from=Sutthawat+-+Arun+Amarin+Intersection&to=Sukhumvit&originSelected=true&destinationSelected=true&lang=en'
route_request = requests.get(url)
soup_route = BeautifulSoup(route_request.content, 'lxml')
descriptions = soup_route.find('div', attrs={'id': 'routeDescription'})
的descriptions
的HTML看起来像下面
<div id="routeDescription">
...
<br/>
<img src="/images/walk_icon_small.PNG" style="vertical-align:middle;padding-right: 10px;margin-right: 0px;"/>Walk by foot to <b>Sanam Luang</b>
<br/>
<img src="/images/bus_icon_semi_small.gif" style="vertical-align:middle;padding-right: 10px;margin-right: 0px;"/>Travel to <b>Khok Wua</b> using the line(s): <b><a href="lines/bangkok-bus-line/2">2</a></b> or <a href="lines/bangkok-bus-line/15">15</a> or <a href="lines/bangkok-bus-line/44">44</a> or <a href="lines/bangkok-bus-line/47">47</a> or <a href="lines/bangkok-bus-line/59">59</a> or <a href="lines/bangkok-bus-line/201">201</a> or <a href="lines/bangkok-bus-line/203">203</a> or <a href="lines/bangkok-bus-line/512">512</a><br/>
...
</div>
基本上,我试图让行动和公交线路列表,行驶到下一个位置(问题的答案更新,但仍然没” t解决)。
route_descrtions = []
for description in descriptions.find_all('img'):
action = description.next_sibling
to_station = action.next_sibling
n = action.find_next_siblings('a')
if 'travel' in action.lower():
lines = [to_station.find_next('b').text] + [a.contents[0] for a in n]
else:
lines = []
desp = {'action': action,
'to': to_station.text,
'lines': lines}
route_descrtions.append(desp)
不过,我不知道如何通过链接循环的每个动作(Travel to
行动)之后,并追加到我的名单。我试过find_next('a')
和find_next_siblings('a')
,但没有完成我的任务。
输出
[{'action': 'Walk by foot to ', 'lines': [], 'to': 'Wang Lang (Siriraj)'},
{'action': 'Travel to ',
'lines': ['Chao Phraya Express Boat', '40', '48', '501', '508'],
'to': 'Si Phraya'},
{'action': 'Walk by foot to ', 'lines': [], 'to': 'Sheraton Royal Orchid'},
{'action': 'Travel to ',
'lines': ['16', '40', '48', '501', '508'],
'to': 'Siam'},
{'action': 'Travel to ',
'lines': ['BTS - Sukhumvit', '40', '48', '501', '508'],
'to': 'Asok'},
{'action': 'Walk by foot to ', 'lines': [], 'to': 'Sukhumvit'}]
所需的输出
[{'action': 'Walk by foot to ', 'lines': [], 'to': 'Wang Lang (Siriraj)'},
{'action': 'Travel to ',
'lines': ['Chao Phraya Express Boat'],
...
谢谢安德烈!该解决方案适用于我。也感谢您的好解释。已经接受了答案(并竖起大拇指)! – titipata