2012-07-05 53 views
2

首先,您应该知道我已经查看了很多问题,但他们都没有帮助我。 我希望能够阅读doc和docx文档(当我说阅读我的意思是最简单的东西,只阅读文本)。 我看到一些关于poi和scratchpad的帖子,但是我无法使它正常工作,并且大部分日食甚至无法构建我的项目...如何在java中阅读doc和docx

有人可以给我一个doc和docx的代码示例并给我所需的所有罐子的名称(或链接)?

谢谢!

基本上这是代码:

try { 
    if (getFileExtention(path).equals("docx")) { 
     FileInputStream fis = new FileInputStream(path); 
     XWPFWordExtractor oleTextExtractor = 
      new XWPFWordExtractor(new XWPFDocument(fis)); 
     return oleTextExtractor.getText(); 
    } else if (getFileExtention(path).equals("doc")) { 
     FileInputStream fis = new FileInputStream(path); 
     WordExtractor we = new WordExtractor(fis); 
     return we.getText(); 
    } 
} catch (FileNotFoundException e) { 
    e.printStackTrace(); 
} catch (IOException e) { 
    e.printStackTrace(); 
} 


return ""; 

我有以下罐:

DOM4J-1.6.1.jar

POI-3.8-20120326.jar

POI -ooxml-3.8-20120326.jar

poi-scratchpad-3.8-20120326.jar

的xmlbeans-xmlpublic-2.4.0.jar

我有以下问题:

构建

> [2012-07-05 14:12:53 - iCards] Dx warning: Ignoring InnerClasses 
> attribute for an anonymous inner class 
> (org.dom4j.xpath.DefaultXPath$1) that doesn't come with an associated 
> EnclosingMethod attribute. This class was probably produced by a 
> compiler that did not target the modern .class file format. The 
> recommended solution is to recompile the class from source, using an 
> up-to-date compiler and without specifying any "-target" type options. 
> The consequence of ignoring this warning is that reflective operations 
> on this class will incorrectly indicate that it is *not* an inner 
> class. 

在这一个多次出现另一种:(当试图读取DOCX )

> 07-05 14:17:13.245: W/System.err(4339): java.io.IOException: read 
> failed: EBADF (Bad file number) 07-05 14:17:13.255: 
> W/System.err(4339): at libcore.io.IoBridge.read(IoBridge.java:432) 
> 07-05 14:17:13.260: W/System.err(4339): at 
> java.io.FileInputStream.read(FileInputStream.java:179) 07-05 
> 14:17:13.265: W/System.err(4339):  at 
> java.io.PushbackInputStream.read(PushbackInputStream.java:196) 07-05 
> 14:17:13.270: W/System.err(4339):  at 
> libcore.io.Streams.readFully(Streams.java:81) 07-05 14:17:13.275: 
> W/System.err(4339): at 
> java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:230) 
> 07-05 14:17:13.280: W/System.err(4339): at 
> org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:51) 
> 07-05 14:17:13.285: W/System.err(4339): at 
> org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:83) 
> 07-05 14:17:13.290: W/System.err(4339): at 
> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:228) 
> 07-05 14:17:13.295: W/System.err(4339): at 
> org.apache.poi.util.PackageHelper.open(PackageHelper.java:39) 07-05 
> 14:17:13.300: W/System.err(4339):  at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:120) 
> 07-05 14:17:13.305: W/System.err(4339): at 
> com.ronEven.iCards.AddRemove.loadFile(AddRemove.java:504) 07-05 
> 14:17:13.310: W/System.err(4339):  at 
> com.ronEven.iCards.AddRemove.showDoc(AddRemove.java:495) 07-05 
> 14:17:13.315: W/System.err(4339):  at 
> com.ronEven.iCards.AddRemove.setFilePath(AddRemove.java:492) 07-05 
> 14:17:13.320: W/System.err(4339):  at 
> com.ronEven.iCards.FileDialog$1.onClick(FileDialog.java:177) 07-05 
> 14:17:13.325: W/System.err(4339):  at 
> android.view.View.performClick(View.java:3591) 07-05 14:17:13.330: 
> W/System.err(4339): at 
> android.view.View$PerformClick.run(View.java:14263) 07-05 
> 14:17:13.335: W/System.err(4339):  at 
> android.os.Handler.handleCallback(Handler.java:605) 07-05 
> 14:17:13.340: W/System.err(4339):  at 
> android.os.Handler.dispatchMessage(Handler.java:92) 07-05 
> 14:17:13.345: W/System.err(4339):  at 
> android.os.Looper.loop(Looper.java:137) 07-05 14:17:13.345: 
> W/System.err(4339): at 
> android.app.ActivityThread.main(ActivityThread.java:4507) 07-05 
> 14:17:13.345: W/System.err(4339):  at 
> java.lang.reflect.Method.invokeNative(Native Method) 07-05 
> 14:17:13.350: W/System.err(4339):  at 
> java.lang.reflect.Method.invoke(Method.java:511) 07-05 14:17:13.350: 
> W/System.err(4339): at 
> com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:790) 
> 07-05 14:17:13.350: W/System.err(4339): at 
> com.android.internal.os.ZygoteInit.main(ZygoteInit.java:557) 07-05 
> 14:17:13.350: W/System.err(4339):  at 
> dalvik.system.NativeStart.main(Native Method) 07-05 14:17:13.355: 
> W/System.err(4339): Caused by: libcore.io.ErrnoException: read failed: 
> EBADF (Bad file number) 07-05 14:17:13.360: W/System.err(4339): at 
> libcore.io.Posix.readBytes(Native Method) 07-05 14:17:13.360: 
> W/System.err(4339): at libcore.io.Posix.read(Posix.java:118) 07-05 
> 14:17:13.360: W/System.err(4339):  at 
> libcore.io.BlockGuardOs.read(BlockGuardOs.java:149) 07-05 
> 14:17:13.360: W/System.err(4339):  at 
> libcore.io.IoBridge.read(IoBridge.java:422) 07-05 14:17:13.365: 
> W/System.err(4339): ... 24 more 

而最后一个试图读取DOC

07-05 14:17:37.015: W/System.err(4339): java.io.IOException: read failed: EBADF (Bad file number) 
07-05 14:17:37.020: W/System.err(4339):  at libcore.io.IoBridge.read(IoBridge.java:432) 
07-05 14:17:37.025: W/System.err(4339):  at java.io.FileInputStream.read(FileInputStream.java:179) 
07-05 14:17:37.055: W/System.err(4339):  at java.io.PushbackInputStream.read(PushbackInputStream.java:196) 
07-05 14:17:37.055: W/System.err(4339):  at java.io.InputStream.read(InputStream.java:163) 
07-05 14:17:37.060: W/System.err(4339):  at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:95) 
07-05 14:17:37.065: W/System.err(4339):  at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:53) 
07-05 14:17:37.070: W/System.err(4339):  at com.ronEven.iCards.AddRemove.loadFile(AddRemove.java:509) 
07-05 14:17:37.075: W/System.err(4339):  at com.ronEven.iCards.AddRemove.showDoc(AddRemove.java:495) 
07-05 14:17:37.085: W/System.err(4339):  at com.ronEven.iCards.AddRemove.setFilePath(AddRemove.java:492) 
07-05 14:17:37.090: W/System.err(4339):  at com.ronEven.iCards.FileDialog$1.onClick(FileDialog.java:177) 
07-05 14:17:37.095: W/System.err(4339):  at android.view.View.performClick(View.java:3591) 
07-05 14:17:37.100: W/System.err(4339):  at android.view.View$PerformClick.run(View.java:14263) 
07-05 14:17:37.105: W/System.err(4339):  at android.os.Handler.handleCallback(Handler.java:605) 
07-05 14:17:37.110: W/System.err(4339):  at android.os.Handler.dispatchMessage(Handler.java:92) 
07-05 14:17:37.115: W/System.err(4339):  at android.os.Looper.loop(Looper.java:137) 
07-05 14:17:37.120: W/System.err(4339):  at android.app.ActivityThread.main(ActivityThread.java:4507) 
07-05 14:17:37.120: W/System.err(4339):  at java.lang.reflect.Method.invokeNative(Native Method) 
07-05 14:17:37.125: W/System.err(4339):  at java.lang.reflect.Method.invoke(Method.java:511) 
07-05 14:17:37.125: W/System.err(4339):  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:790) 
07-05 14:17:37.130: W/System.err(4339):  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:557) 
07-05 14:17:37.130: W/System.err(4339):  at dalvik.system.NativeStart.main(Native Method) 
07-05 14:17:37.130: W/System.err(4339): Caused by: libcore.io.ErrnoException: read failed: EBADF (Bad file number) 
07-05 14:17:37.150: W/System.err(4339):  at libcore.io.Posix.readBytes(Native Method) 
07-05 14:17:37.160: W/System.err(4339):  at libcore.io.Posix.read(Posix.java:118) 
07-05 14:17:37.160: W/System.err(4339):  at libcore.io.BlockGuardOs.read(BlockGuardOs.java:149) 
07-05 14:17:37.160: W/System.err(4339):  at libcore.io.IoBridge.read(IoBridge.java:422) 
07-05 14:17:37.165: W/System.err(4339):  ... 20 more 
+1

你能告诉我们你到目前为止? – Keppil 2012-07-05 11:03:47

+0

@Ron:我很确定POI文档包含您要求的所有内容。除非你有* *特定*问题,你*显示*问题是什么(以及你如何解决它),如果没有复制你已经阅读过的文档/教程,这个问题是不能回答的。 – 2012-07-05 11:06:17

+0

POI是一个非常成熟的图书馆。如果您无法构建或运行,则应该针对POI和/或Eclipse更具体地提出问题。 – 2012-07-05 11:06:41

回答

3

Tika支持Microsoft Office格式以及其他许多格式,它为您提供了所有格式的通用界面,并隐藏了维护和学习如何使用大量不同库的复杂性。这就像调用这个function一样简单。您也可以直接使用Office ParserOOXMLParser

+0

Tika可以解析doc,但不能解析docx。因此它对我不好... – 2012-07-05 11:29:44

+0

+1:很好的链接 – 2012-07-05 12:15:28

+1

不正确 - Tika解析.docx文件就好!请参阅[支持的格式页面](http://tika.apache.org/1.1/formats.html#Microsoft_Office_document_formats)了解所涵盖内容的细节,但我可以向您保证.docx是支持的之一 – Gagravarr 2012-07-05 12:52:48

0

您也有非常强大的应用程序,如LibreOffice SDK(或OpenOffice 3),您可以在其中阅读和管理文档(如docx)并将其保存为.txt格式。

0
  • 对于阅读DOCX文档我们可以使用XWPFWordExtractorXWPFDocument
  • 对于阅读DOC文档我们可以使用WordExtractorHWPFDocument

你得到了DOCX文档右边的代码:

XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));

但HWPFDocument从您的DOC文档代码失踪。只是改变这一行:

WordExtractor we = new WordExtractor(fis);

到这一点:

WordExtractor we = new WordExtractor(new HWPFDocument(fis));

至于jar文件,只有POI-OOXML-架构 - 3.8-20120326.jar似乎是从你失踪构建路径。