2012-10-11 128 views
1

我试图从数据库中检索docx,并尝试通过检查其内容来处理它。我认为我的代码检索了我想要的文件,但似乎我没有完全理解APACHE POI。我在堆栈跟踪中遇到错误,说我错了POI有什么想法?使用APACHE POI处理docx文件

下面是如何加载文件:

public void loadFile(String FileName) 
{ 
    InputStream is = null; 
    try 
    { 
     //Connecting to MYSQL Database 
     Class.forName(driver).newInstance(); 
     con = DriverManager.getConnection(url+dbName,userName,password); 

     Statement stmt = (Statement) con.createStatement(); 
     ResultSet rs = stmt.executeQuery("SELECT FILE FROM doccompfiles WHERE FileName = '"+ FileName +"'"); 

     while(rs.next()) 
     { 
      is = rs.getBinaryStream("FILE"); 
     } 

     HWPFDocument doc = new HWPFDocument(is); 
     WordExtractor we = new WordExtractor(doc); 

     String[] paragraphs = we.getParagraphText(); 
     JOptionPane.showMessageDialog(null, "Number of Paragraphs" + paragraphs.length); 
     con.close(); 
    } 
    catch(Exception ex) 
    { 
     ex.printStackTrace(); 
    } 
} 

堆栈跟踪:

org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF) 
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:131) 
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104) 
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138) 
at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:106) 
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174) 
at documentComparisor.Database.loadFile(Database.java:156) 
at documentComparisor.Home$5.actionPerformed(Home.java:195) 
at javax.swing.AbstractButton.fireActionPerformed(Unknown Source) 
at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source) 
at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source) 
at javax.swing.DefaultButtonModel.setPressed(Unknown Source) 
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source) 
at java.awt.Component.processMouseEvent(Unknown Source) 
at javax.swing.JComponent.processMouseEvent(Unknown Source) 
at java.awt.Component.processEvent(Unknown Source) 
at java.awt.Container.processEvent(Unknown Source) 
at java.awt.Component.dispatchEventImpl(Unknown Source) 
at java.awt.Container.dispatchEventImpl(Unknown Source) 
at java.awt.Component.dispatchEvent(Unknown Source) 
at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source) 
at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source) 
at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source) 
at java.awt.Container.dispatchEventImpl(Unknown Source) 
at java.awt.Window.dispatchEventImpl(Unknown Source) 
at java.awt.Component.dispatchEvent(Unknown Source) 
at java.awt.EventQueue.dispatchEventImpl(Unknown Source) 
at java.awt.EventQueue.access$000(Unknown Source) 
at java.awt.EventQueue$3.run(Unknown Source) 
at java.awt.EventQueue$3.run(Unknown Source) 
at java.security.AccessController.doPrivileged(Native Method) 
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source) 
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source) 
at java.awt.EventQueue$4.run(Unknown Source) 
at java.awt.EventQueue$4.run(Unknown Source) 
at java.security.AccessController.doPrivileged(Native Method) 
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source) 
at java.awt.EventQueue.dispatchEvent(Unknown Source) 
at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source) 
at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source) 
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source) 
at java.awt.EventDispatchThread.pumpEvents(Unknown Source) 
at java.awt.EventDispatchThread.pumpEvents(Unknown Source) 
at java.awt.EventDispatchThread.run(Unknown Source) 
+1

这是最有用的例外,我见过 –

回答

4

正如你应该知道,现在有两种不同的格式存在MS Office文档:一个是旧的格式(例如“.doc”或“.xls”),另一种是新版本(例如“.docx”或“.xlsx”)使用的基于XML的格式。

Apache POI中有不同的部分处理不同的格式。用于处理旧MS Office格式文件的关键类名称通常以“H”开头,用于处理基于XML格式文件的类的名称以“X”开头。

所以,在你的例子来处理新的格式,你应该使用HWPFDocument的XWPFDocument:

XWPFDocument doc = new XWPFDocument(is); 
+0

感谢您对二者的详细比较。我终于明白他们的分歧。 – ljpv14

+0

我很高兴它有帮助。 –

+0

有没有在Apache POI中将HWPF转换为XWPF的方法? –

相关问题