我正在通过以下xml- http://charts.realclearpolitics.com/charts/1044.xml解析。我想在包含3列的数据框中显示结果:日期,批准,拒绝。 xml文件是动态的,因为每天添加一个新日期,所以代码应该考虑到这一点。我已经实现了一个静态的解决方案,即我必须循环给出值标签行号。我想了解如何动态实现它。在python中通过xml解析
import numpy as np
import pandas as pd
import requests
from pattern import web
xml = requests.get('http://charts.realclearpolitics.com/charts/1044.xml').text
dom = web.Element(xml)
values = dom.by_tag('value')
date = []
approve = []
disapprove = []
values = dom.by_tag('value')
#The last range number below is 1720 instead of 1727 as last 6 values of Approve & Disapprove tag are blank.
for i in range(0,1720):
date.append(pd.to_datetime(values[i].content))
#The last range number below is 3447 instead of 3454 as last 6 values are blank. Including till 3454 will give error while converting to float.
for i in range(1727,3447):
a = float(values[i].content)
approve.append(a)
#The last range number below is 5174 instead of 5181 as last 6 values are blank.
for i in range(3454,5174):
a = float(values[i].content)
disapprove.append(a)
finalresult = pd.DataFrame({'date': date, 'Approve': approve, 'Disapprove': disapprove})
finalresult
LXML具有XPath的支持,这似乎是你想要的。然后你可以用xpath命令获取元素,不管它们有多少。 –