2010-08-31 57 views
1

我使用Apache POI HWPF来提取.doc文件,我发现提取的文本没有章节号,可以用POI提取文本的章节号?如何在文本中提取.doc文件中的章节号?

public void readDocFile() { 
    File docFile = null; 
    WordExtractor docExtractor = null; 
    WordExtractor exprExtractor = null; 
    try { 
     docFile = new File("C:\\Documents and Settings\\Administrator\\Desktop\\Topo6.doc"); 
     // A FileInputStream obtains input bytes from a file. 
     FileInputStream fis = new FileInputStream(docFile.getAbsolutePath()); 

     // A HWPFDocument used to read document file from FileInputStream 
     HWPFDocument doc = new HWPFDocument(fis); 
     docExtractor = new WordExtractor(doc); 
    } catch (Exception exep) { 
     System.out.println(exep.getMessage()); 
    } 

    // This Array stores each line from the document file. 
    String text = docExtractor.getText(); 
    System.out.println(text); 


} 

回答

2

好吧,我明白了。

在office word中生成的.doc文件中的章节号是动态的,所以我必须得到每个段落的级别并自己计算章节号。

相关问题