Python：将信息从xml提取到字典

我需要从xml文件中提取信息，将其与xml标记之前和之后隔离，将信息存储在字典中，然后遍历字典以打印列表。我是一个绝对的初学者，所以我想尽可能简单地保持它，我很抱歉，如果我描述了我想做的事情没有多大意义。Python：将信息从xml提取到字典

这是我到目前为止。

for line in open("/people.xml"): 
if "name" in line: 
    print (line) 
if "age" in line: 
    print(line)

电流输出：

 <name>John</name> 

    <age>14</age> 

    <name>Kevin</name> 

    <age>10</age> 

    <name>Billy</name> 

    <age>12</age>

所需的输出

Name   Age 
John   14 
Kevin   10 
Billy   12

编辑 - 因此，使用下面的代码，我可以得到的输出：

{'Billy': '12', 'John': '14', 'Kevin': '10'}

有谁知道从他这里得到一张图表喜欢我想要的输出？

来源

2013-01-14 user1975140

你应该使用['xml.dom']（http://docs.python.org/2/library/xml.dom.html）。这会让你的生活变得更轻松。 – inspectorG4dget

我需要使用python，我特别在Mac上使用IDLE。 – user1975140

为此使用XML parser。例如，

import xml.etree.ElementTree as ET 
doc = ET.parse('people.xml') 
names = [name.text for name in doc.findall('.//name')] 
ages = [age.text for age in doc.findall('.//age')] 
people = dict(zip(names,ages)) 
print(people) 
# {'Billy': '12', 'John': '14', 'Kevin': '10'}

来源

2013-01-14 01:38:20 unutbu

这没有奏效，我收到一条以ParseError结尾的错误消息：文档元素后面的垃圾：第44行，第0列 – user1975140

请发布您的people.xml文件的前45行。 – unutbu

好的，我修正了第45行中的一个错误，现在我可以得到{'Billy'：'12'，'John'：'14'，'Kevin'：'10'}，但我确实需要它像顶部格式的列一样，带有标题。我认为我对单词列表的使用可能令人困惑，但是如何将这些数据放入列中？ – user1975140

在我看来，这是在学习如何解析这个XML手动，而不是简单地拉动库出来的包包为你做的练习。如果我错了，我建议观看史蒂夫霍夫曼的可以在这里找到的udacity视频：http://www.udacity.com/view#Course/cs253/CourseRev/apr2012/Unit/362001/Nugget/365002。他解释了如何使用minidom模块来解析这些轻量级xml文件。

现在，我想在我的答案中做的第一点是，你不想创建一个Python字典来打印所有这些值。 Python字典只是一组与键值对应的键。没有对它们的排序，所以它们在文件中出现的顺序遍历是一个令人头痛的问题。您试图打印出所有名称以及相应的年龄，因此数据结构（如元组列表）可能更适合整理数据。

看起来你的XML文件的结构是这样的，每个名称标签都被一个与它相对应的年龄标签成功。似乎每行只有一个名称标签。这使事情变得相当简单。我不打算写出最有效或最普遍的解决方案来解决这个问题，但我会尽可能让代码尽可能简单易懂。

因此，让我们先创建一个表来存储数据：

那么，让我们创建一个表来存储数据：的a_list = []

现在打开你的文件，并初始化几个变量保存每个姓名和年龄：

from __future__ import with_statement 

with open("/people.xml") as f: 
    name, age = None, None #initialize a name and an age variable to be used during traversals. 
    for line in f: 
     name = extract_name(line,name) # This function will be defined later. 
     age = extract_age(line) # So will this one. 
     if age: #We know that if age is defined, we can add a person to our list and reset our variables 
      a_list.append((name,age)) # and now we can re-initialize our variables. 
      name,age = None , None # otherwise simply read the next line until age is defined.

现在对于文件中的每一行，我们想确定它是否包含用户。如果确实如此，我们想提取名称。让我们创建用来做这样的功能：现在

def extract_name(a_line,name): #we pass in the line as well as the name value that that we defined before beginning our traversal. 
    if name: # if the name is predefined, we simply want to keep the name at its current value. (we can clear it upon encountering the corresponding age.) 
     return name 
    if not "<name>" in a_line: #if no "<name>" in a_line, return. otherwise, extract new name. 
     return 
    name_pos = a_line.find("<name>")+6 
    end_pos = a_line.find("</name>") 
    return a_line[name_pos:end_pos]

，我们必须创建一个函数来解析为一个用户的年龄线。我们可以通过类似于前一个函数的方式来做到这一点，但我们知道，一旦我们有了一个年龄，它就会立即添加到列表中。因此，我们永远不需要关注自己以前的价值。该功能因此可能如下所示：

def extract_age(a_line): 
    if not "<age>" in a_line: #if no "<age>" in a_line: 
     return 
    age_pos = a_line.find("<age>")+5 # else extract age from line and return it. 
    end_pos = a_line.find("</age>") 
    return a_line[age_pos:end_pos]

最后，您要打印该列表。你可以这样做：

for item in a_list: 
    print '\t'.join(item)

希望这对我有所帮助。我还没有测试过我的代码，所以它可能仍然有点bug。虽然这些概念在那里。 :)

来源

2013-01-14 05:14:22

所有好的，直到 return line [name_pos：end_pos]，它说'返回'外的函数，当我缩进它，我得到'意想不到的缩进'，当在前一行结尾放置冒号我得到'无效语法' 。我怕这就是我所知道的尝试。 – user1975140

哎呀，犯了一个小错误。在每个函数定义中，您都希望将“line”的每个实例替换为“a_line”。现在编辑我的代码。此外，请确保您始终使用四个空格或单个选项卡缩进您的代码。有时python编译器不会将它们视为等同的。 –

也注意到我已经将两个项目而不是元组传递给join参数。该错误也应该修复。 –

尝试xmldict（XML转换到Python字典，反之亦然。）：

>>> xmldict.xml_to_dict(''' 
... <root> 
... <persons> 
...  <person> 
...  <name first="foo" last="bar" /> 
...  </person> 
...  <person> 
...  <name first="baz" last="bar" /> 
...  </person> 
... </persons> 
... </root> 
... ''') 
{'root': {'persons': {'person': [{'name': {'last': 'bar', 'first': 'foo'}}, {'name': {'last': 'bar', 'first': 'baz'}}]}}} 


# Converting dictionary to xml 
>>> xmldict.dict_to_xml({'root': {'persons': {'person': [{'name': {'last': 'bar', 'first': 'foo'}}, {'name': {'last': 'bar', 'first': 'baz'}}]}}}) 
'<root><persons><person><name><last>bar</last><first>foo</first></name></person><person><name><last>bar</last><first>baz</first></name></person></persons></root>'

或尝试xmlmapper（与亲子关系的Python字典的列表）：

>>> myxml='''<?xml version='1.0' encoding='us-ascii'?> 
      <slideshow title="Sample Slide Show" date="2012-12-31" author="Yours Truly" > 
      <slide type="all"> 
       <title>Overview</title> 
       <item>Why 
        <em>WonderWidgets</em> 
        are great 
        </item> 
        <item/> 
        <item>Who 
        <em>buys</em> 
        WonderWidgets1 
       </item> 
      </slide> 
      </slideshow>''' 
    >>> x=xml_to_dict(myxml) 
    >>> for s in x: 
      print s 
    >>> 
    {'text': '', 'tail': None, 'tag': 'slideshow', 'xmlinfo': {'ownid': 1, 'parentid': 0}, 'xmlattb': {'date': '2012-12-31', 'author': 'Yours Truly', 'title': 'Sample Slide Show'}} 
    {'text': '', 'tail': '', 'tag': 'slide', 'xmlinfo': {'ownid': 2, 'parentid': 1}, 'xmlattb': {'type': 'all'}} 
    {'text': 'Overview', 'tail': '', 'tag': 'title', 'xmlinfo': {'ownid': 3, 'parentid': 2}, 'xmlattb': {}} 
    {'text': 'Why', 'tail': '', 'tag': 'item', 'xmlinfo': {'ownid': 4, 'parentid': 2}, 'xmlattb': {}} 
    {'text': 'WonderWidgets', 'tail': 'are great', 'tag': 'em', 'xmlinfo': {'ownid': 5, 'parentid': 4}, 'xmlattb': {}} 
    {'text': None, 'tail': '', 'tag': 'item', 'xmlinfo': {'ownid': 6, 'parentid': 2}, 'xmlattb': {}} 
    {'text': 'Who', 'tail': '', 'tag': 'item', 'xmlinfo': {'ownid': 7, 'parentid': 2}, 'xmlattb': {}} 
    {'text': 'buys', 'tail': 'WonderWidgets1', 'tag': 'em', 'xmlinfo': {'ownid': 8, 'parentid': 7}, 'xmlattb': {}}

上面的代码将会生成

。当你迭代它时;您将获得dict密钥的信息;如tag,text,xmlattb,tail和xmlinfo中的附加信息。这里root元素将有parentid信息作为0。

来源

2013-01-14 05:19:57 namit

xmldict有bug，>>> xml_to_dict（''' love'''）生成{'i'：{'t'：'love'}}。属性type =“all”消失了。 –

下面是使用LXML库的另一种方式：

from lxml import objectify 


def xml_to_dict(xml_str): 
    """ Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """ 
    def xml_to_dict_recursion(xml_object): 
     dict_object = xml_object.__dict__ 
     if not dict_object: # if empty dict returned 
      return xml_object 
     for key, value in dict_object.items(): 
      dict_object[key] = xml_to_dict_recursion(value) 
     return dict_object 
    return xml_to_dict_recursion(objectify.fromstring(xml_str)) 

xml_string = """<?xml version="1.0" encoding="UTF-8"?><Response><NewOrderResp> 
<IndustryType>Test</IndustryType><SomeData><SomeNestedData1>1234</SomeNestedData1> 
<SomeNestedData2>3455</SomeNestedData2></SomeData></NewOrderResp></Response>""" 

print xml_to_dict(xml_string)

要保留父节点，而不是使用：

def xml_to_dict(xml_str): 
    """ Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """ 
    def xml_to_dict_recursion(xml_object): 
     dict_object = xml_object.__dict__ 
     if not dict_object: # if empty dict returned 
      return xml_object 
     for key, value in dict_object.items(): 
      dict_object[key] = xml_to_dict_recursion(value) 
     return dict_object 
    xml_obj = objectify.fromstring(xml_str) 
    return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}

如果你希望只返回一个子树并将其转换为字典，你可以使用Element.find（）：

xml_obj.find('.//') # lxml.objectify.ObjectifiedElement instance

请参阅lxml documentation。

来源

2015-07-15 19:10:10 radtek

Python：将信息从xml提取到字典

回答

相关问题