在python中创建一个带有For循环的xml文件

我有一个包含超过100000行的txt文件，并且我想创建一个XML树。但所有的行都共享相同的根。在python中创建一个带有For循环的xml文件

这里txt文件：

LIBRARY: 
1,1,1,1,the 
1,2,1,1,world 
2,1,1,2,we 
2,5,2,1,have 
7,3,1,1,food

所需的输出：

<LIBRARY> 
    <BOOK ID ="1"> 
     <CHAPTER ID ="1"> 
      <SENT ID ="1"> 
       <WORD ID ="1">the</WORD> 
      </SENT> 
     </CHAPTER> 
    </BOOK> 
    <BOOK ID ="1"> 
     <CHAPTER ID ="2"> 
      <SENT ID ="1"> 
       <WORD ID ="1">world</WORD> 
      </SENT> 
     </CHAPTER> 
    </BOOK> 
    <BOOK ID ="2"> 
     <CHAPTER ID ="1"> 
      <SENT ID ="1"> 
       <WORD ID ="2">we</WORD> 
      </SENT> 
     </CHAPTER> 
    </BOOK> 
    <BOOK ID ="2"> 
     <CHAPTER ID ="5"> 
      <SENT ID ="2"> 
       <WORD ID ="1">have</WORD> 
      </SENT> 
     </CHAPTER> 
    </BOOK> 
    <BOOK ID ="7"> 
     <CHAPTER ID ="3"> 
      <SENT ID ="1"> 
       <WORD ID ="1">food</WORD> 
      </SENT> 
     </CHAPTER> 
    </BOOK> 
</LIBRARY>

我使用元树txt文件转换为XML文件，这是代码我运行

def expantree(): 
    lines = txtfile.readlines() 
    for line in lines: 
    split_line = line.split(',') 
    BOOK.set('ID ', split_line[0]) 
    CHAPTER.set('ID ', split_line[1]) 
    SENTENCE.set('ID ', split_line[2]) 
    WORD.set('ID ', split_line[3]) 
    WORD.text = split_line[4] 
    tree = ET.ElementTree(Root) 
    tree.write(xmlfile)

好吧，代码工作，但我没有得到所需的输出，我得到以下内容：

<LIBRARY> 
    <BOOK ID ="1"> 
     <CHAPTER ID ="1"> 
      <SENT ID ="1"> 
       <WORD ID ="1">the</WORD> 
      </SENT> 
     </CHAPTER> 
    </BOOK> 
</LIBRARY> 
<LIBRARY> 
    <BOOK ID ="1"> 
     <CHAPTER ID ="2"> 
      <SENT ID ="1"> 
       <WORD ID ="1">world</WORD> 
      </SENT> 
     </CHAPTER> 
    </BOOK> 
</LIBRARY> 
<LIBRARY> 
    <BOOK ID ="2"> 
     <CHAPTER ID ="1"> 
      <SENT ID ="1"> 
       <WORD ID ="2">we</WORD> 
      </SENT> 
     </CHAPTER> 
    </BOOK> 
</LIBRARY> 
<LIBRARY> 
    <BOOK ID ="2"> 
     <CHAPTER ID ="5"> 
      <SENT ID ="2"> 
       <WORD ID ="1">have</WORD> 
      </SENT> 
     </CHAPTER> 
    </BOOK> 
</LIBRARY> 
<LIBRARY> 
    <BOOK ID ="7"> 
     <CHAPTER ID ="3"> 
      <SENT ID ="1"> 
       <WORD ID ="1">food</WORD> 
      </SENT> 
     </CHAPTER> 
    </BOOK> 
</LIBRARY>

如何统一树根，所以而不是得到许多根标签我得到一个根标签？

来源

2013-07-22 spring rose

一种方法是创建完整的树并打印它。我用下面的代码：

from lxml import etree as ET 

def create_library(lines): 
    library = ET.Element('LIBRARY') 
    for line in lines: 
     split_line = line.split(',') 
     library.append(create_book(split_line)) 
    return library 

def create_book(split_line): 
    book = ET.Element('BOOK',ID=split_line[0]) 
    book.append(create_chapter(split_line)) 
    return book 

def create_chapter(split_line): 
    chapter = ET.Element('CHAPTER',ID=split_line[1]) 
    chapter.append(create_sentence(split_line)) 
    return chapter 

def create_sentence(split_line): 
    sentence = ET.Element('SENT',ID=split_line[2]) 
    sentence.append(create_word(split_line)) 
    return sentence 

def create_word(split_line): 
    word = ET.Element('WORD',ID=split_line[3]) 
    word.text = split_line[4] 
    return word

那么你的代码来创建该文件看起来像：

def expantree(): 
    lines = txtfile.readlines() 
    library = create_library(lines) 
    ET.ElementTree(lib).write(xmlfile)

如果你不希望加载整个树在内存中（你提到有更多的超过10万行），您可以手动创建标签，每次写入一本书，然后添加标签。在这种情况下，你的代码看起来像：

def expantree(): 
    lines = txtfile.readlines() 
    f = open(xmlfile,'wb') 
    f.write('<LIBRARY>') 
    for line in lines: 
     split_line = line.split(',') 
     book = create_book(split_line) 
     f.write(ET.tostring(book)) 
    f.write('</LIBRARY>') 
    f.close()

我没有与LXML那么多的经验，所以可能会有更多的优雅的解决方案，但是这两种工作。

来源

2013-07-22 18:19:10

谢谢，你的回答很有价值 –

很高兴我能帮到你。 –

这也许是更简洁的另一个选择是如下：

from xml.etree import ElementTree as ET 
import io 
import os 

# Setup the test input 
inbuf = io.StringIO(''.join(['LIBRARY:\n', '1,1,1,1,the\n', '1,2,1,1,world\n', 
          '2,1,1,2,we\n', '2,5,2,1,have\n', '7,3,1,1,food\n'])) 

tags = ['BOOK', 'CHAPTER', 'SENT', 'WORD'] 
with inbuf as into, io.StringIO() as xmlfile: 
    root_name = into.readline() 
    root = ET.ElementTree(ET.Element(root_name.rstrip(':\n'))) 
    re = root.getroot() 
    for line in into: 
     values = line.split(',') 
     parent = re 
     for i, v in enumerate(values[:4]): 
      parent = ET.SubElement(parent, tags[i], {'ID': v}) 
      if i == 3: 
       parent.text = values[4].rstrip(':\n') 
    root.write(xmlfile, encoding='unicode', xml_declaration=True) 
    xmlfile.seek(0, os.SEEK_SET) 
    for line in xmlfile: 
     print(line)

什么这个代码是构建从输入数据的ElementTree并将其写入作为XML文件一个类文件的对象。此代码可以与标准Python xml.etree包或lxml一起使用。代码使用Python 3.3进行测试。

来源

2013-07-22 21:25:09 Jonathan

这是一个建议，使用lxml（用Python 2.7测试）。代码可以很容易地适用于ElementTree，但很难得到漂亮的打印输出（参见https://stackoverflow.com/a/16377996/407651）。

输入文件是library.txt，输出文件是library.xml。

from lxml import etree 

lines = open("library.txt").readlines() 
library = etree.Element('LIBRARY') # The root element 

# For each line with data in the input file, create a BOOK/CHAPTER/SENT/WORD structure 
for line in lines: 
    values = line.split(',') 
    if len(values) == 5: 
     book = etree.SubElement(library, "BOOK") 
     book.set("ID", values[0]) 
     chapter = etree.SubElement(book, "CHAPTER") 
     chapter.set("ID", values[1]) 
     sent = etree.SubElement(chapter, "SENT") 
     sent.set("ID", values[2]) 
     word = etree.SubElement(sent, "WORD") 
     word.set("ID", values[3]) 
     word.text = values[4].strip() 

etree.ElementTree(library).write("library.xml", pretty_print=True)

来源

2013-07-22 22:02:20 mzjn

我upvoted，但由于SubElement允许属性设置为'book = etree.SubElement（library，'BOOK'，ID = values [0]）'，set（）操作可以被消除。 – tdelaney

在python中创建一个带有For循环的xml文件

回答

相关问题