从xml标签中检索数据Python

我试图使用以下代码检索'a：t'标签之间的类型=“slidenum”之间的幻灯片编号，但某些内容不起作用。我应该得到1从xml标签中检索数据Python

这里的XML：

<a:p><a:fld id="{55FBEE69-CA5C-45C8-BA74-481781281731}" type="slidenum"> 
<a:rPr lang="en-US" sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/> 
</a:solidFill></a:rPr><a:pPr/><a:t>1</a:t></a:fld><a:endParaRPr lang="en-US" 
sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/></a:solidFill> 
</a:endParaRPr></a:p></p:txBody></p:sp>

这里是我的代码

z = zipfile.ZipFile(pptx_filename) 
    for name in z.namelist(): 
     m = re.match(r'ppt/notesSlides/notesSlide\d+\.xml', name) 
    if m is not None: 
     f = z.open(name) 
     tree = ET.parse(f) 
     f.close() 
     root = tree.getroot() 
     # Find the slide number. 
     slide_num = None 
     for fld in root.findall('/'.join(['.', '', p.txBody, a.p, a.fld])): 
      if fld.get('type', '') == 'slidenum': 
       slide_num = int(fld.find(a.t).text) 
       print slide_num

来源

2015-06-30 eleanor massy

<一个：FLD ID = “{55FBEE69-CA5C-45C8-BA74-481781281731}” 类型= “slidenum”> –

您能编辑问题以包含XML吗？我认为这对我们有很大的帮助:)在评论 – Jerfov2

'a：'中很难阅读它，这意味着这些元素都在XML命名空间中。搜索这些标签时可能需要包含名称空间。如果你不确定如何做，你应该检查这个答案：http://stackoverflow.com/a/14853417/849425 –

：

# cElementTree is the faster, C language based big brother of ElementTree 
from xml.etree import cElementTree as etree 

# Our test XML 
xml = ''' 
<a:p xmlns:a="http://example.com"><a:fld id="{55FBEE69-CA5C-45C8-BA74-481781281731}" type="slidenum"> 
<a:rPr lang="en-US" sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/> 
</a:solidFill></a:rPr><a:pPr/><a:t>1</a:t></a:fld><a:endParaRPr lang="en-US" 
sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/></a:solidFill> 
</a:endParaRPr></a:p> 
''' 

# Manually specify the namespace. The prefix letter ("a") is arbitrary. 
namespaces = {"a":"http://example.com"} 

# Parse the XML string 
tree = etree.fromstring(xml) 

""" 
Breaking down the search expression below 
    a:fld - Find the fld element prefixed with namespace identifier a: 
    [@type='slidenum'] - Match on an attribute type with a value of 'slidenum' 
    /a:t - Find the child element t prefixed with namespace identifier a: 
""" 
slidenums = tree.findall("a:fld[@type='slidenum']/a:t", namespaces) 
for slidenum in slidenums: 
    print(slidenum.text)

下面是使用使用提供的命名空间的外部文件相同的例子下面的OP：

from xml.etree import cElementTree as etree 

tree = etree.parse("my_xml_file.xml") 
namespaces = {"a":"http://schemas.openxmlformats.org/presentationml/2006/main"} 
slidenums = tree.findall("a:fld[@type='slidenum']/a:t", namespaces) 
for slidenum in slidenums: 
    print(slidenum.text)

来源

2015-06-30 02:47:31

嘿迈克！谢谢您的回复！我使用的XML只是一个片段，当我使用整个文件时，代码不起作用。 'tree = parse（file）'解析文件后如何使用你的代码？ –

@eleanormassy我放入了一个虚假的名称空间URL，因为从给出真实名称空间URL的XML示例中不明显。您可能需要将该URL更改为XML文件中的URL。（你会看到它被定义为一个属性'xmlns：a =“”' –

是的，我得到了那部分，我改变了网址到我的文件中！我怎么在'tree = parse（文件）'？谢谢 –

分析之前，我会删除您的XML命名空间的标签。然后使用XPATH fld[@type='slidenum']/t找到类型为fld的所有节点，其中fld[@type='slidenum']/t和子节点t。这里有一个例子来说明这是如何工作的：从Moxymoo的回答以下使用的命名空间，而不是删除它们的改性

from lxml import etree 

xml = """ 
<a:p><a:fld id="{55FBEE69-CA5C-45C8-BA74-481781281731}" type="slidenum"> 
<a:rPr lang="en-US" sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/> 
</a:solidFill></a:rPr><a:pPr/><a:t>1</a:t></a:fld><a:endParaRPr lang="en-US" 
sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/></a:solidFill> 
</a:endParaRPr></a:p> 
""" 

tree = etree.fromstring(xml.replace('a:','')) 
slidenum = tree.find("fld[@type='slidenum']/t").text 
print(slidenum) 
1

来源

2015-06-30 02:30:02 maxymoo

通常定义XML名称空间以消除元素名称中的歧义。根据文档的结构，删除它们可能会产生意想不到的后果。我假定OP所显示的XML是大文档的一部分 - 部分原因是它的格式不正确（这对我来说意味着它被错误地复制和粘贴），另外也因为它看起来是XML中的PowerPoint幻灯片套件格式。（Microsoft Office的XML格式非常详细。） –

从xml标签中检索数据Python

回答

相关问题