我想从网站上得到一段文字,但是我是这样做的。 我得到的网页文本删除所有的HTML标签,我想找出它是否可能得到某个段落形式的所有文本返回。网页抓取,python和beautifulsoup
继承人我的代码
import requests
from bs4 import BeautifulSoup
response = requests.get("https://en.wikipedia.org/wiki/Aras_(river)")
txt = response.content
soup = BeautifulSoup(txt,'lxml')
filtered = soup.get_text()
print(filtered)
文本的继承人一部分它打印出来
>>>>Basin
Main source
Erzurum Province, Turkey
River mouth
Kura river
Physical characteristics
Length
1,072 km (666 mi)
The Aras or Araxes is a river in and along the countries of Turkey,
Armenia, Azerbaijan, and Iran. It drains the south side of the Lesser
Caucasus Mountains and then joins the Kura River which drains the north
side of those mountains. Its total length is 1,072 kilometres (666 mi).
Given its length and a basin that covers an area of 102,000 square
kilometres (39,000 sq mi), it is one of the largest rivers of the
Caucasus.
Contents
1 Names
2 Description
3 Etymology and history
4 Iğdır Aras Valley Bird Paradise
5 Gallery
6 See also
7 Footnotes
,我只想要得到这一段
The Aras or Araxes is a river in and along the countries of Turkey,
Armenia, Azerbaijan, and Iran. It drains the south side of the Lesser
Caucasus Mountains and then joins the Kura River which drains the north
side of those mountains. Its total length is 1,072 kilometres (666 mi).
Given its length and a basin that covers an area of 102,000 square
kilometres (39,000 sq mi), it is one of the largest rivers of the
Caucasus.
是可以过滤掉这段?
您应该多阅读BeautifulSoup文档。您可以提供classnames和xpaths来明确指定要从中检索数据的元素。 – JosephGarrone
会做@JosephGarrone – Boneyflesh