2016-07-14 74 views
0

我想从PDFBox中获取字体颜色,并且我似乎一直在抛出异常。有人可以帮忙吗?我试图获得颜色的方法是(网页是我获得的PDPage):从PDFBox中获取字体的颜色

PDResources = page.getResources(); 
Iterable<COSName> fontNames = resources.getFontNames(); 
for (COSName fontName:fontNames) 
    System.out.println("name: " + resources.getFont(fontName).getName() + 
         "colour: " + resources.getColorSpace(fontName).getName()); 

这样就会打印出异常:

org.apache.pdfbox.pdmodel.MissingResourceException: Missing color space: F1 

有人能告诉我如何正确获取的颜色以这种方式获得的字体?从源代码下载

+0

字体没有颜色的。他们可以用抚摸或不抚摸的颜色或两者甚至更多。要明白我的意思,请使用Adobe Reader(不使用Firefox)查看PDF文件:https://issues.apache.org/jira/browse/PDFBOX-678。而且你甚至可以从图像中切出文本,或者使阴影使单个字形可以有多种颜色。你是否事先知道你的PDF文件不会使用任何“有趣”的模式? –

+0

@TilmanHausherr我明白你的意思,是的,我相信我的pdf文件中不会有这种边缘情况。在这种情况下,是否可以从中获取任何颜色信息? – kabeersvohra

+0

林不知道什么抚摸颜色。这是我需要的吗?即使有这样一个奇怪的情况下,算法输出的字体颜色之一,似乎足够我的使用案例 – kabeersvohra

回答

1

尝试PrintTextColors:

/** 
* This is an example on how to get the colors of text. Note that this will not tell the background, 
* and will only work properly if the text is not overwritten later, and only if the text rendering 
* modes are 0, 1 or 2. In the PDF 32000 specification, please read 9.3.6 "Text Rendering Mode" to 
* know more. Mode 0 (FILL) is the default. Mode 1 (STROKE) will make glyphs look "hollow". Mode 2 
* (FILL_STROKE) will make glyphs look "fat". 
* 
* @author Ben Litchfield 
* @author Tilman Hausherr 
*/ 
public class PrintTextColors extends PDFTextStripper 
{ 
    /** 
    * Instantiate a new PDFTextStripper object. 
    * 
    * @throws IOException If there is an error loading the properties. 
    */ 
    public PrintTextColors() throws IOException 
    { 
     addOperator(new SetStrokingColorSpace()); 
     addOperator(new SetNonStrokingColorSpace()); 
     addOperator(new SetStrokingDeviceCMYKColor()); 
     addOperator(new SetNonStrokingDeviceCMYKColor()); 
     addOperator(new SetNonStrokingDeviceRGBColor()); 
     addOperator(new SetStrokingDeviceRGBColor()); 
     addOperator(new SetNonStrokingDeviceGrayColor()); 
     addOperator(new SetStrokingDeviceGrayColor()); 
     addOperator(new SetStrokingColor()); 
     addOperator(new SetStrokingColorN()); 
     addOperator(new SetNonStrokingColor()); 
     addOperator(new SetNonStrokingColorN()); 
    } 

    /** 
    * This will print the documents data. 
    * 
    * @param args The command line arguments. 
    * 
    * @throws IOException If there is an error parsing the document. 
    */ 
    public static void main(String[] args) throws IOException 
    { 
     if (args.length != 1) 
     { 
      usage(); 
     } 
     else 
     { 
      PDDocument document = null; 
      try 
      { 
       document = PDDocument.load(new File(args[0])); 

       PDFTextStripper stripper = new PrintTextColors(); 
       stripper.setSortByPosition(true); 
       stripper.setStartPage(0); 
       stripper.setEndPage(document.getNumberOfPages()); 

       Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream()); 
       stripper.writeText(document, dummy); 
      } 
      finally 
      { 
       if (document != null) 
       { 
        document.close(); 
       } 
      } 
     } 
    } 

    @Override 
    protected void processTextPosition(TextPosition text) 
    { 
     super.processTextPosition(text); 

     PDColor strokingColor = getGraphicsState().getStrokingColor(); 
     PDColor nonStrokingColor = getGraphicsState().getNonStrokingColor(); 
     String unicode = text.getUnicode(); 
     RenderingMode renderingMode = getGraphicsState().getTextState().getRenderingMode(); 
     System.out.println("Unicode:   " + unicode); 
     System.out.println("Rendering mode:  " + renderingMode); 
     System.out.println("Stroking color:  " + strokingColor); 
     System.out.println("Non-Stroking color: " + nonStrokingColor); 
     System.out.println("Non-Stroking color: " + nonStrokingColor); 
     System.out.println(); 

     // See the PrintTextLocations for more attributes 
    } 

    /** 
    * This will print the usage for this document. 
    */ 
    private static void usage() 
    { 
     System.err.println("Usage: java " + PrintTextColors.class.getName() + " <input-pdf>"); 
    } 
} 
+0

好吧我正在实施它,我将起始页设置为0,并将最终页面设置为返回IOException(文档为一页)的页数。然后,我将起始页设置为1,并将最终页设置为1,并且getText函数以及writeText函数返回一个空字符串 – kabeersvohra

+0

也许您没有任何文本要在该PDF中提取。这可能会阻止文本提取。尝试在另一个PDF上的代码。 –

+0

是啊我刚刚完成了多个不同的PDF文档的测试,它们都有文本,你想让我把它们发送给你吗? – kabeersvohra