2011-06-14 71 views
1

我有这样一个逻辑:IO错误与LXML etree解析功能

for root, dirs, files in os.walk(os.getcwd()): 
    if "info.xml" in files: 
     root = lxml.etree.parse("%s/info.xml" % root) 
     tag = root.xpath("/info/tagname")[0].text 

当解析一个info.xml这在电流路径非常深,遇到错误消息:

Traceback (most recent call last): 
    File "/home/work/mergefile.py", line 365, in <module> 
    File "/home/work/mergefile.py", line 344, in merge_ejb_files 
    File "/home/work/mergefile.py", line 63, in __init__ 
    File "/home/work/mergefile.py", line 78, in _parse_info2doc 
    File "lxml.etree.pyx", line 2698, in lxml.etree.parse (src/lxml/lxml.etree.c:49590) 
    File "parser.pxi", line 1491, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71205) 
    File "parser.pxi", line 1520, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:71488) 
    File "parser.pxi", line 1420, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:70583) 
    File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:67736) 
    File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:63820) 
    File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:64741) 
    File "parser.pxi", line 563, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64056) 
IOError: Error reading file '/home/work/ci/case/dc_daily/dc/213577/223922/223958/792536/info.xml': failed to load external entity "/home/work/ci/case/dc_daily/dc/213577/223922/223958/792536/info.xml" 

但文件"/home/work/ci/case/dc_daily/dc/213577/223922/223958/792536/info.xml"存在和我可以在ipython下用lxml解析它IDE

你知道问题是什么吗?如果你知道它,请帮助我! 谢谢!

+1

IIRC此错误告诉您lxml无法加载文件中指定的外部实体。这可能是DOCTYPE,模式,外部实体规范(&和东西)。可以在不验证模式一致性的情况下加载文档,这反过来会跳过外部实体加载。 'parse'函数应该有一些参数。对不起,我现在有点忙,所以你必须自己找:) – 2011-06-14 11:18:32

+0

谢谢命令,今天当我调试这个问题时,我先打开xml文件,然后用文件描述符作为参数调用lxml.etree.parse ,它会在打开文件'/ home/work/ci/case/dc_daily/dc/213577/223922/223958/792536/info.xml'时引发IOError:打开文件过多,所以这不是实际上lxml的问题,因为linux已经设置了只能在一个进程中打开1024个文件,我正在尝试使用子进程 – Rachel 2011-06-15 06:39:43

+0

来打开文件,解析它,然后关闭它?这样一次只能打开一个文件。 – 2011-08-06 04:22:20

回答

0

这是我的解决方案,根据我上面的评论。我打开文件进行阅读,他们关闭他们后,所以我没有达到1024文件限制。

import lxml.etree as etree 
for root,dirs,files in os.walk(os.getcwd()): 
    if "info.xml" in files: 
     with open('%s/info.xml'%root) as processfile: #use 'rb' if necessary 
      xml = etree.parse(processfile) 
      tag = root.xpath("/info/tagname")[0].text