删除在Python 3 lxml的

我有我以前评论的一些元素一个XML文件中的所有意见，现在我想以取消他们..删除在Python 3 lxml的

我有这样的结构

<parent parId="22" attr="Alpha"> 
<!--<reg regId="1"> 
    <cont>There is some content</cont><cont2 attr1="val">Another content</cont2> 
</reg> 
--></parent> 
<parent parId="23" attr="Alpha"> 
<reg regId="1"> 
    <cont>There is more content</cont><cont2 attr1="noval">Morecont</cont2> 
</reg> 
</parent> 
<parent parId="24" attr="Alpha"> 
<!--<reg regId="1"> 
    <cont>There is some content</cont><cont2 attr1="val">Another content</cont2> 
</reg> 
--></parent>

我想取消注释文件的所有评论。因此，也是评论的因素，我会取消注释。

我能找到的评论使用XPath的元素。这是我的代码片段。

def unhide_element(): 
    path = r'path_to_file\file.xml' 
    xml_parser = et.parse(path) 
    comments = root.xpath('//comment') 
    for c in comments: 
     print('Comment: ', c) 
     parent_comment = c.getparent() 
     parent_comment.replace(c,'') 
     tree = et.ElementTree(root) 
     tree.write(new_file)

但是，替换不工作，因为它期望另一个元素。

我该如何解决这个问题？

来源

2017-09-27 TMikonos

您的代码中缺少创造从注释文本的新的XML元素的关键位。还有一些与错误的XPath查询相关的其他错误，并在循环内多次保存输出文件。

而且，看来你与lxml.etree混合xml.etree。按照documentation，前者忽略注释当XML文件进行解析，所以最好的方法是使用lxml。

固定所有上述的后，我们得到这样的事情。

import lxml.etree as ET 


def unhide_element(): 
    path = r'test.xml' 
    root = ET.parse(path) 
    comments = root.xpath('//comment()') 
    for c in comments: 
     print('Comment: ', c) 
     parent_comment = c.getparent() 
     parent_comment.remove(c) # skip this if you want to retain the comment 
     new_elem = ET.XML(c.text) # this bit creates the new element from comment text 
     parent_comment.addnext(new_elem) 

    root.write(r'new_file.xml')

来源

2017-09-27 16:14:21

大，这个工作。但是，我不知道为什么我的lxml版本不使用getroot（）第一次不起作用。我无法直接在ElementTree中解析。 – TMikonos

好，既然你想取消注释的一切，你真正需要做的是删除每个 “<！ - ” 和 “ - >”：

import re 

new_xml = ''.join(re.split('<!--|-->', xml))

或者：

new_xml = xml.replace('<!--', '').replace('-->', '')

来源

2017-09-27 15:55:57

删除在Python 3 lxml的

回答

相关问题