2016-06-28 38 views
3

如何使用Java和Apache POI XWPF库从.docx文件中提取编号和文本?如何从.docx文件提取编号和文本

我使用下面的代码:

public static void readDocxFile() { 

    try { 
     File file = new File("C:\\test.docx"); 
     FileInputStream fis = new FileInputStream(file.getAbsolutePath()); 
     XWPFDocument document = new XWPFDocument(fis); 
     List<XWPFParagraph> paragraphs = document.getParagraphs(); 

     for (XWPFParagraph para : paragraphs) { 
      System.out.println(para.getText()); 

      fis.close(); 
     } 
    } catch (Exception e) { 
     e.printStackTrace(); 
    } 
} 

我的代码仅提取文本,如下图所示:

CLIENT SERVICE SATISFACTION 
Client Feedback System 
Interlibrary Loans 
Shelf Tidiness 
Three Day Loans 
Materials Availability Survey 
Online help service 

我需要用文字来提取章节号(编号),像下面这样:

1 CLIENT SERVICE SATISFACTION 
1.1 Client Feedback System 
1.1.1 Interlibrary Loans 
1.1.2 Shelf Tidiness 
1.1.3 Three Day Loans 
1.2 Materials Availability Survey 
1.3 Online help service 
+0

其仅取出'客户服务满意度,客户反馈系统,馆际互借Loans' –

+0

这似乎是有用的信息的问题,包括。此外,为什么在迭代'段落'时关闭'fis',它可能不会导致错误,但是它是多余的。你应该在循环之外关闭'fis'。 –

+0

是的,我纠正它,但窗台我无法提取所有值 –

回答

0

为了得到一个doc文件的文本,你需要使用XWFParagraph(使用POI-OOXML AP我)方法。为了得到该段的编号尝试下面的代码:

BigInteger currentParagraphNumberingID = currentPara_Line.getCTP().getPPr().getNumPr().getNumId().getVal(); 
BigInteger currentParagraphAbstractNumID2 = currentPara_Line.getDocument().getNumbering().getAbstractNumID(currentParagraphNumberingID); 
XWPFAbstractNum currentParagraphAbstractNum = currentPara_Line.getDocument().getNumbering().getAbstractNum(currentParagraphAbstractNumID2); 
CTAbstractNum currentParagraphAbstractNumFormatting = currentParagraphAbstractNum.getCTAbstractNum();         
相关问题