我基本上使用的urllib打开在参数列表中的每个 股票所需的网页,阅读该网页的HTML代码 的全部内容。然后,我正在切片,以便找到我正在寻找的报价 。
下面是Beautiful Soup
和requests
,落实:
import requests
from bs4 import BeautifulSoup
def get_quotes(*stocks):
quotelist = {}
base = 'https://finance.google.com/finance?q={}'
for stock in stocks:
url = base.format(stock)
soup = BeautifulSoup(requests.get(url).text, 'html.parser')
quote = soup.find('span', attrs={'class' : 'pr'}).get_text().strip()
quotelist[stock] = float(quote)
return quotelist
print(get_quotes('AAPL', 'GE', 'C'))
{'AAPL': 160.86, 'GE': 23.91, 'C': 68.79}
# 1 loop, best of 3: 1.31 s per loop
正如你可能想看看multithreading或grequests的评论中提到。
使用grequests
进行异步HTTP请求:
def get_quotes(*stocks):
quotelist = {}
base = 'https://finance.google.com/finance?q={}'
rs = (grequests.get(u) for u in [base.format(stock) for stock in stocks])
rs = grequests.map(rs)
for r, stock in zip(rs, stocks):
soup = BeautifulSoup(r.text, 'html.parser')
quote = soup.find('span', attrs={'class' : 'pr'}).get_text().strip()
quotelist[stock] = float(quote)
return quotelist
%%timeit
get_quotes('AAPL', 'BAC', 'MMM', 'ATVI',
'PPG', 'MS', 'GOOGL', 'RRC')
1 loop, best of 3: 2.81 s per loop
更新:这里是从尘土飞扬菲利普斯Python 3的面向对象的编程使用修改后的版本内置threading
模块。
from threading import Thread
from bs4 import BeautifulSoup
import numpy as np
import requests
class QuoteGetter(Thread):
def __init__(self, ticker):
super().__init__()
self.ticker = ticker
def run(self):
base = 'https://finance.google.com/finance?q={}'
response = requests.get(base.format(self.ticker))
soup = BeautifulSoup(response.text, 'html.parser')
try:
self.quote = float(soup.find('span', attrs={'class':'pr'})
.get_text()
.strip()
.replace(',', ''))
except AttributeError:
self.quote = np.nan
def get_quotes(tickers):
threads = [QuoteGetter(t) for t in tickers]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
quotes = dict(zip(tickers, [thread.quote for thread in threads]))
return quotes
tickers = [
'A', 'AAL', 'AAP', 'AAPL', 'ABBV', 'ABC', 'ABT', 'ACN', 'ADBE', 'ADI',
'ADM', 'ADP', 'ADS', 'ADSK', 'AEE', 'AEP', 'AES', 'AET', 'AFL', 'AGN',
'AIG', 'AIV', 'AIZ', 'AJG', 'AKAM', 'ALB', 'ALGN', 'ALK', 'ALL', 'ALLE',
]
%time get_quotes(tickers)
# Wall time: 1.53 s
退房[美丽的汤(https://www.crummy.com/software/BeautifulSoup/bs4/doc/) – Mako212
我会用'requests'包工作,而不是'urllib'直接。我会认为上面的代码运行得非常快,不是吗?当你有很多请求时,你可以看看多线程。应该很好地根据代码加快速度。 – Andras
哦,是的,并检查美丽的汤或lxml,如上所述。 – Andras