使用ITextRenderer从HTML生成非拉丁字符的PDF不起作用

这是我花在调查中的第二天没有结果。至少现在，我能够提出一些非常具体的问题。使用ITextRenderer从HTML生成非拉丁字符的PDF不起作用

我想写使用iText更具体使用ITextRenderer从Flying Saucer包含在PDF文件中的一些非拉丁字符有效的HTML代码。

我的小例子/代码通过初始化字符串变量DOC该值：

String doc = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\">" 
      + "<body>Some greek characters: Καλημέρα Some greek characters" 
      + "</body></html>";

这里是我在调试时使用的代码。我这个字符串保存为HTML文件，然后我打开它通过浏览器只是为了仔细检查HTML内容有效，我仍然可以读希腊字符：

//write for debugging purposes in an html file 
File newTextFile = new File("C:/work/test.html"); 
FileWriter fw = new FileWriter(newTextFile); 
fw.write(doc); 
fw.close();

下一步是尝试写在这个值PDF文件。这是我的代码：

ITextRenderer renderer = new ITextRenderer(); 
    //add some fonts - if paths are not right, an exception will be thrown 
    renderer.getFontResolver().addFont("c:/work/fonts/TIMES.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED); 
    renderer.getFontResolver().addFont("c:/work/fonts/TIMESBD.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED); 
    renderer.getFontResolver().addFont("c:/work/fonts/TIMESBI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED); 
    renderer.getFontResolver().addFont("c:/work/fonts/TIMESI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED); 


    final DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory 
      .newInstance(); 
    documentBuilderFactory.setValidating(false); 
    DocumentBuilder builder = documentBuilderFactory.newDocumentBuilder(); 
    builder.setEntityResolver(FSEntityResolver.instance()); 
    org.w3c.dom.Document document = builder.parse(new ByteArrayInputStream(
      doc.toString().getBytes("UTF-8"))); 

    renderer.setDocument(document, null); 
    renderer.layout(); 
    renderer.createPDF(os);

我的代码的最终结果是：

在HTML文件我得到：一些希腊字符：Καλημέρα一些希腊字符（预期）

在PDF文件我得到：一些希腊字符：一些希腊字符（意想不到 - 希腊字符被忽略！）

依赖关系：

Java版本 “1.6.0_27”
iText的 - 2.0.8.jar
de.huxhorn.lilith.3rdparty。 flyingsaucer.core-renderer-8Pre2.jar

我也已经尝试过很多莫重新字体，但我想我的问题与使用错误的字体无关。任何帮助都比欢迎。

感谢名单

来源

2012-04-20 alexandros

让iText读取它包含utf-8内容的HTML内容的头信息。
添加meta标记为content-type在html代码中与utf-8charset编码然后运行iText生成PDF并检查结果。

<?xml version="1.0" encoding="UTF-8"?> 
<html xmlns="http://www.w3.org/1999/xhtml" lang="en"> 
<head> 
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 
</head> 
<body> 
    Some greek characters: Καλημέρα Some greek characters 
</body> 
</html>

更新：
如果以上不工作，然后参考ENCODING VERSUS THE DEFAULT CHARSET USED BY THE JVM公布的文件在http://www.manning.com/lowagie2/iText2E_MEAP_CH02.pdf

来源

2012-04-20 19:29:50

刚刚试过，没有好消息:(我得到的结果相同 @Ravinder我想你错过了在你的例子中：P – alexandros 2012-04-20 20:09:11

@alexandros：请检查我的答案的更新。 – 2012-04-20 20:30:51

我在我的测试中添加了这个：System.out.println（“file.encoding =”+ System.getProperty（“file.encoding”））;打印结果如下：file.encoding = UTF-8。这应该足以确保我有正确的编码？ – alexandros 2012-04-20 20:51:11

我来自捷克共和国，并有同样的问题，与我们国家的象征！经过一番搜索后，我设法用this solution解决了这个问题。

具体有（你已经有了）：

renderer 
    .getFontResolver() 
    .addFont(fonts.get(i).getFile().getPath(), 
      BaseFont.IDENTITY_H, 
      BaseFont.NOT_EMBEDDED);

，然后在CSS重要部分：

* { 
    font-family: Verdana; 
/* font-family: Times New Roman; - alternative. Without ""! */ 
}

在我看来，没有那个CSS，不使用你的字体。当我从CSS中删除这些线时，编码再次被破坏。

希望这会有所帮助！

来源

2012-07-09 16:20:17 ArcanisCz

谢谢你的正确解决方案！指定字体（在我的情况下它是* DejaVu Serif *）工作！ – informatik01 2014-04-28 10:33:31

添加到您的HTML是这样的：

<?xml version='1.0' encoding='UTF-8'?> 
<!DOCTYPE html> 
<html> 
    <head> 
     <meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/> 
     <style type='text/css'> 
      * { font-family: 'Arial Unicode MS'; } 
     </style> 
    </head> 
    <body> 
     <span>Some text with šđčćž characters</span> 
    </body> 
</html>

，然后在Java代码中添加FontResolver到ITextRenderer：

ITextRenderer renderer = new ITextRenderer(); 
renderer.getFontResolver().addFont("fonts/ARIALUNI.TTF", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);

克罗地亚人物的伟大工程，用于生成PDF

罐是：

core-renderer.jar 
iText-2.0.8.jar

来源

2013-12-02 10:01:55 C2V3N

使用ITextRenderer从HTML生成非拉丁字符的PDF不起作用

回答

相关问题