2013-02-21 80 views
1

当我节省处理一些文本使用POI的XPath后的docx文件,我则ByteArrayOutputStream传递到一个新的ByteArrayInputStream的和饲料它与docx4j无法读取POI保存的文件,谁有错?

wordMLPackage = WordprocessingMLPackage.load(
    bis 
); 

到dox4j随着4分之3的我的模板,这将引发一个例外:

org.docx4j.openpackaging.exceptions.InvalidFormatException: Unexpected package (docx4j supports docx/docxm and pptx only 
    at org.docx4j.openpackaging.contenttype.ContentTypeManager.createPackage(ContentTypeManager.java:834) 

的代码看起来是这样的:

/* Return a package of the appropriate type. Used when loading an existing 
* Package, with an already populated [Content_Types].xml. When 
* creating a new Package, start with the new WordprocessingMLPackage constructor. */ 
public OpcPackage createPackage() throws InvalidFormatException { 

    /* 
    * How do we know what type of Package this is? 
    * 
    * In principle, either: 
    * 
    * 1. We were told its file extension or mime type in the 
    * constructor/method parameters, or 
    * 
    * 2. Because [Content_Types].xml contains an override for PartName 
    * /document.xml of content type 
    * application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml 
    * 
    * The latter approach is more reliable, so .. 
    * 
    */ 
    OpcPackage p; 

    if (getPartNameOverridenByContentType(ContentTypes.WORDPROCESSINGML_DOCUMENT) != null 
      || getPartNameOverridenByContentType(ContentTypes.WORDPROCESSINGML_DOCUMENT_MACROENABLED) != null 
      || getPartNameOverridenByContentType(ContentTypes.WORDPROCESSINGML_TEMPLATE) != null 
      || getPartNameOverridenByContentType(ContentTypes.WORDPROCESSINGML_TEMPLATE_MACROENABLED) != null) { 
     log.info("Detected WordProcessingML package "); 
     p = new WordprocessingMLPackage(this); 
     return p; 
    } else if (getPartNameOverridenByContentType(ContentTypes.PRESENTATIONML_MAIN) != null 
      || getPartNameOverridenByContentType(ContentTypes.PRESENTATIONML_TEMPLATE) != null 
      || getPartNameOverridenByContentType(ContentTypes.PRESENTATIONML_SLIDESHOW) != null) { 
     log.info("Detected PresentationMLPackage package "); 
     p = new PresentationMLPackage(this); 
     return p; 
    } else if (getPartNameOverridenByContentType(ContentTypes.SPREADSHEETML_WORKBOOK) != null 
      || getPartNameOverridenByContentType(ContentTypes.SPREADSHEETML_WORKBOOK_MACROENABLED) != null 
      || getPartNameOverridenByContentType(ContentTypes.SPREADSHEETML_TEMPLATE) != null 
      || getPartNameOverridenByContentType(ContentTypes.SPREADSHEETML_TEMPLATE_MACROENABLED) != null) { 
     // "xlam", "xlsb" ? 
     log.info("Detected SpreadhseetMLPackage package "); 
     p = new SpreadsheetMLPackage(this); 
     return p; 

    } else if (getPartNameOverridenByContentType(ContentTypes.DRAWINGML_DIAGRAM_LAYOUT) != null) { 
     log.info("Detected Glox file "); 
     p = new GloxPackage(this); 
     return p; 
    } else { 
     throw new InvalidFormatException("Unexpected package (docx4j supports docx/docxm and pptx only"); 
     //return new Package(this); 
    } 
} 

这似乎是无法匹配一些特定的内容类型覆盖。在我的出发DOCX模板有一个[CONTENT_TYPES] .xml文件其中有:

<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"> 
    <Override PartName="/_rels/.rels"  ContentType="application/vnd.openxmlformats-package.relationships+xml" /> 
    <Override PartName="/word/fontTable.xml"  ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml" /> 
    <Override PartName="/word/_rels/document.xml.rels"  ContentType="application/vnd.openxmlformats-package.relationships+xml" /> 
    <Override PartName="/word/media/image1.wmf"   ContentType="image/x-wmf" /> 
    <Override PartName="/word/comments.xml"   ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml" /> 
    <Override PartName="/word/numbering.xml"  ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml" /> 
    <Override PartName="/word/footer1.xml"  ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml" /> 
    <Override PartName="/word/document.xml"   ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml" /> 
    <Override PartName="/word/styles.xml"  ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml" /> 
    <Override PartName="/docProps/app.xml"  ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml" /> 
    <Override PartName="/docProps/core.xml"   ContentType="application/vnd.openxmlformats-package.core-properties+xml" /> 
</Types> 

与POI处理后的[CONTENT_TYPES] .XML看起来是这样的:

<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"> 
    <Default Extension="xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/> 
    <Override PartName="/_rels/.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/> 
    <Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/> 
    <Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/> 
    <Override PartName="/word/_rels/document.xml.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/> 
    <Override PartName="/word/comments.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml"/> 
    <Override PartName="/word/fontTable.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml"/> 
    <Override PartName="/word/footer1.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml"/> 
    <Override PartName="/word/media/image1.wmf" ContentType="image/x-wmf"/> 
    <Override PartName="/word/numbering.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml"/> 
    <Override PartName="/word/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml"/> 
</Types> 

注意,占优PartName =“/ word/document.xml”缺失!

这是无字可接受的文件内容类型的文件/ document.xml中重写?它在LibreOffice中打开,没有投诉。是docx4j依靠其可能不存在的内容类型,或POI不正确书写的内容类型我的一些文件(3出4)覆盖的标签。

+1

我认为这是一个错误docx4j - POI被设置与docx4j看起来是忽略了正确类型的默认。 – Gagravarr 2013-02-21 16:51:22

+0

我同意,我已经在github项目上打开了第46期的一些代码想法来解决它。我仍然想知道规范说的Override标签。 – chugadie 2013-02-21 18:13:57

回答

2

披露:我docx4j项目导致

什么POI做似乎是按照规范合法,但效果不理想。

每ECMA-376第2部分,“获取部分的内容类型”,当指定的POI做它的方式docx4j应该找到的docx的内容类型。

在第1部分所述的WordprocessingML中章,说“包结构”一节中:

首先,内容类型关系的部件和主文档 部分(唯一必需的部分)必须被定义(物理位置在包 /[Content_Types].xml):

<Types 
xmlns="http://schemas.openxmlformats.org/package/2006/content-types"> 
&lt;Default Extension="rels" 
ContentType="application/vnd.openxmlformatspackage. 
relationships+xml"/> 
<Override PartName="/document.xml" 
ContentType="application/vnd.openxmlformatsofficedocument. 
wordprocessingml.document.main+xml"/> </Types> 

我的阅读是你必须定义主文档部分的内容类型(POI母鹿s),提示只是使用覆盖来做到这一点。

当我的大多数零件都是.xml并且需要一个覆盖来指定某些东西时,对于与匹配一个(或者可能是2或3个零件)的东西,使用.xml默认没什么意义不同。我想知道为什么POI这样做 - 与规范中的建议不同,与Word发出的不同。

也就是说,https://github.com/plutext/docx4j/commit/1c1190fc3a2fc6e191c825a0e30fde2654cc997c应该解决这个问题。

相关问题