2017-09-24 129 views
0

我有新的问题时,我将HTML转换成DOCX它抛出异常:Docx4j将HTML转换成DOCX

org.xml.sax.SAXParseException; lineNumber:4; columnNumber:73;实体“NBSP”被引用,但没有宣布

我的理解,这是因为docx4j认为我的文件是XML,并希望将其转换为DOCX,但只有5个在XML和这样的实体预定义实体因为nbsp没有在XML中定义。我怎样才能让docx4j将HTML转换为doc,而无需在doctype中声明实体nbsp?

docx4j的作品是不正确的还是它的限制?

这里是我的代码:

package ru.simplexsoftware.constructorOfDocuments.web.rest; 
import org.docx4j.convert.in.xhtml.XHTMLImporterImpl; 
import org.docx4j.openpackaging.exceptions.Docx4JException; 
import org.docx4j.openpackaging.exceptions.InvalidFormatException; 
import org.docx4j.openpackaging.packages.WordprocessingMLPackage; 
import org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart; 
import org.springframework.beans.factory.annotation.Autowired; 
import org.springframework.web.HttpRequestHandler; 
import ru.simplexsoftware.constructorOfDocuments.dao.TemplateDao; 
import javax.servlet.ServletException; 
import javax.servlet.http.HttpServletRequest; 
import javax.servlet.http.HttpServletResponse; 
import javax.xml.bind.JAXBException; 
import java.io.ByteArrayInputStream; 
import java.io.ByteArrayOutputStream; 
import java.io.IOException; 
import java.io.InputStream; 
import java.nio.charset.StandardCharsets; 


public class DocxFileDownloadServlet implements HttpRequestHandler { 

@Autowired 
TemplateDao templateDao; 
@Override 
public void handleRequest(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { 

    String parameter = request.getParameter("documentId"); 

    Long documentId = Long.parseLong(parameter); 

    WordprocessingMLPackage wordMLPackage = null; 
    try { 
     wordMLPackage = WordprocessingMLPackage.createPackage(); 
    } catch (InvalidFormatException e) { 
     e.printStackTrace(); 
    } 

    NumberingDefinitionsPart ndp = null; 
    try { 
     ndp = new NumberingDefinitionsPart(); 
    } catch (InvalidFormatException e) { 
     e.printStackTrace(); 
    } 
    try { 
     wordMLPackage.getMainDocumentPart().addTargetPart(ndp); 
    } catch (InvalidFormatException e) { 
     e.printStackTrace(); 
    } 
    try { 
     ndp.unmarshalDefaultNumbering(); 
    } catch (JAXBException e) { 
     e.printStackTrace(); 
    } 

    XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage); 
    xHTMLImporter.setHyperlinkStyle("Hyperlink"); 

    String htmlString=templateDao.get(documentId).html; 
    htmlString = htmlString.replaceAll("<br>","<br/>"); 
    InputStream stream = new ByteArrayInputStream(htmlString.getBytes(StandardCharsets.UTF_8.name())); 
    // Convert the XHTML, and add it into the empty docx we made 
    try { 
     wordMLPackage.getMainDocumentPart().getContent().addAll(
       xHTMLImporter.convert(htmlString, null)); 
    } catch (Docx4JException e) { 
     e.printStackTrace(); 
    } 

    ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); 

    try { 
     wordMLPackage.save(outputStream); 
    } catch (Docx4JException e) { 
     e.printStackTrace(); 
    } 


    response.setContentType("application/msword"); 
    response.getOutputStream().write(outputStream.toString().getBytes("UTF-8")); 
    response.flushBuffer(); 

} 
} 
+1

手动或预处理声明实体通过一个整洁的计划。 docx4j-ImportXHTML预计格式良好的XML输入。 – JasonPlutext

+0

JasonPlutext是否可以通过某种方法声明所有实体?我只是不想手动声明所有的html实体。 –

回答

0

你可以尝试使用AltChunkType型插入HTML串入的docx款

wordMLPackage.getMainDocumentPart().addAltChunk(AltChunkType.Xhtml, htmlString .getBytes());