我知道这不完全是你问什么,但我想我会显示您的链接转换日期的方式文本转换为您在所需输出示例中显示的格式(日/月/年)。我用BeautifulSoup从html中读取元素。
from bs4 import BeautifulSoup
import datetime as dt
import re
html = '<a href="data/self/dated/station1_140208.txt">Saturday, February 08, 2014</a><br/>'
p = re.compile(r'.*/station1_\d+\.txt')
soup = BeautifulSoup(html)
a_tags = soup.find_all('a', {"href": p})
>>> print a_tags # would be a list of all a tags in the html with relevant href attribute
[<a href="data/self/dated/station1_140208.txt">Saturday, February 08, 2014</a>]
names = [str(a.get('href')).split('/')[-1] for a in a_tags] #str because they will be in unicode
dates = [dt.datetime.strptime(str(a.text), '%A, %B %m, %Y') for a in a_tags]
名字和日期使用list comprehensions
strptime创造出的日期字符串的
>>> print names # would be a list of all file names from hrefs
['station1_140208.txt']
>>> print dates # would be a list of all dates as datetime objects
[datetime.datetime(2014, 8, 1, 0, 0)]
toFileData = ["{0}: {1}".format(dt.datetime.strftime(d, '%w/%m/%y'), n) for d in dates for n in names]
strftime重新格式化的日期到您的格式,比如datetime对象:
>>> print toFileData
['5/08/14: station1_140208.txt']
然后写入en尝试在toFileData
到一个文件
有关我用如soup.find_all()
和a.get()
在上面的代码的方法的信息,我建议你通过在顶部的链接看看BeautifulSoup
文档。希望这可以帮助。
使用DOM来提取所有链接,并在检查相关链接之后。 –