我想要获取链接的URL,以便在特定时间段内从Yahoo Finance下载资产的历史数据。 1999年1月1日至今。BeautifulSoup的HTML不见了
因此,举例来说,如果我去这里: https://finance.yahoo.com/quote/XLB/history?period1=915177600&period2=1498633200&interval=1d&filter=history&frequency=1d
我想获得这种(从数据表上方的“下载数据”链接):
"https://query1.finance.yahoo.com/v7/finance/download/XLB?period1=915177600&period2=1498633200&interval=1d&events=history&crumb=iX6bJ6LfGxc"
我使用BeautifulSoup和我遇到了所需标签的问题,它保留了href不在html中显示。起初,我认为BeautifulSoup在没有尝试使用find_all('a')并遍历子/后代没有结果后工作不正常。但是当我做了html的文本转储时,html元素(以及父元素中的所有内容)不在那里。 有人可以解释发生了什么事吗?下面列出了我目前的工作。
from bs4 import BeautifulSoup
import datetime as dTime
import requests
"""
asset = "Materials"
assetSignal = "XLB"
today = dTime.datetime.now()
startTime = str(int(dTime.datetime(1999, 1, 1, 0, 0, 0).timestamp()))
endTime = str(int(dTime.datetime(today.year, today.month, today.day, 0, 0, 0).timestamp()))
url = "https://finance.yahoo.com/quote/" + assetSignal + "/history?period1=" + startTime + "&period2=" + endTime + "&interval=1d&filter=history&frequency=1d"
"""
url = "https://finance.yahoo.com/quote/XLB/history?period1=915177600&period2=1498633200&interval=1d&filter=history&frequency=1d"
page = requests.get(url)
data = page.content
#soup = BeautifulSoup(data, "html.parser")
soup = BeautifulSoup(data, "lxml")
#soup = BeautifulSoup(data, "xml")
#soup = BeautifulSoup(data, "html5lib")
#Link not found
for link in soup.find_all("a"):
print(link.get("href"))
#Span is empty?
span = soup.find(class_="Fl(end) Pos(r) T(-6px)")
print(span)
print(span.string)
print(span.contents)
for child in span.children:
print(child)
#Other span has children. Target span doesn't
div = soup.find(class_="C($finDarkGray) Mt(20px) Mb(15px)")
print(div)
for child in div.descendants:
print(child)
#Is the tag even there?
with open("soup.txt", "w") as file:
file.write(page.text)
此代码是否运行?导致'url = https://finance.yahoo.com/quote/XLB/history?period1 = 915177600&period2 = 1498633200&interval = 1d&filter = history&frequency = 1d'对我来说很腥。 – patrick
的代码工作,只需将该URL放在引号中,但确实下载链接在汤的结果中不可用。它看起来像链接是JavaScript,并且BeautifulSoup不执行Javascript,因此如果您使用BeautifulSoup刮取任何通过JS传递或呈现的数据,您将无法使用它。可能需要查看硒或幻灯片 – davedwards