2012-09-26 24 views
2

我想要实现一项功能,该功能允许用户使用JPedal库双击以突出显示PDF文档中的单词。如果我能够得到一个单词的边界矩形并查看MouseEvent位置是否在它内部,这将是微不足道的;以下片段演示如何突出显示一个区域:JPedal - 突出显示PDF中某处的单词

private void highlightText() { 
    Rectangle highlightRectangle = new Rectangle(firstPoint.x, firstPoint.y, 
      secondPoint.x - firstPoint.x, secondPoint.y - firstPoint.y); 
    pdfDecoder.getTextLines().addHighlights(new Rectangle[]{highlightRectangle}, false, currentPage); 
    pdfDecoder.repaint(); 
} 

但是我只能在文档中找到明文提取示例。

回答

0

看过马克的例子后,我设法让它工作。有几个怪癖,所以我会解释它是如何工作的,以防它帮助别人。关键方法是extractTextAsWordlist,当给定要从中提取的区域时,返回{word1, w1_x1, w1_y1, w1_x2, w1_y2, word2, w2_x1, ...}形式的List<String>。下面列出了分步说明。

首先,你需要改造MouseEvent的组件/屏幕坐标到PDF页面坐标,正确缩放:

/** 
* Transforms Component coordinates to page coordinates, correcting for 
* scaling and panning. 
* 
* @param x Component x-coordinate 
* @param y Component y-coordinate 
* @return Point on the PDF page 
*/ 
private Point getPageCoordinates(int x, int y) { 
    float scaling = pdfDecoder.getScaling(); 
    int x_offset = ((pdfDecoder.getWidth() - pdfDecoder.getPDFWidth())/2); 
    int y_offset = pdfDecoder.getPDFHeight(); 
    int correctedX = (int)((x - x_offset + viewportOffset.x)/scaling); 
    int correctedY = (int)((y_offset - (y + viewportOffset.y))/scaling); 
    return new Point(correctedX, correctedY); 
} 

接下来,创建一个框来扫描文本。我选择使这个页和垂直+/- 20页单位(这是一个相当任意数)的宽度,在MouseEvent中心:

/** 
* Scans for all the words located with in a box the width of the page and 
* 40 points high, centered at the supplied point. 
* 
* @param p Point to centre the scan box around 
* @return A List of words within the scan box 
* @throws PdfException 
*/ 
private List<String> scanForWords(Point p) throws PdfException { 
    List<String> result = Collections.emptyList(); 
    if (pdfDecoder.getlastPageDecoded() > 0) { 
     PdfGroupingAlgorithms currentGrouping = pdfDecoder.getGroupingObject(); 
     PdfPageData currentPageData = pdfDecoder.getPdfPageData(); 
     int x1 = currentPageData.getMediaBoxX(currentPage); 
     int x2 = currentPageData.getMediaBoxWidth(currentPage) + x1; 
     int y1 = p.y + 20; 
     int y2 = p.y - 20; 
     result = currentGrouping.extractTextAsWordlist(x1, y1, x2, y2, currentPage, true, ""); 
    } 
    return result; 
} 

然后我解析成的Rectangle秒的序列如下:

/** 
* Parse a String sequence of: 
* {word1, w1_x1, w1_y1, w1_x2, w1_y2, word2, w2_x1, ...} 
* 
* Into a sequence of Rectangles. 
* 
* @param wordList Word list sequence to parse 
* @return A List of Rectangles 
*/ 
private List<Rectangle> parseWordBounds(List<String> wordList) { 
    List<Rectangle> wordBounds = new LinkedList<Rectangle>(); 
    Iterator<String> wordListIterator = wordList.iterator(); 
    while(wordListIterator.hasNext()) { 
     // sequences are: {word, x1, y1, x2, y2} 
     wordListIterator.next(); // skip the word 
     int x1 = (int) Float.parseFloat(wordListIterator.next()); 
     int y1 = (int) Float.parseFloat(wordListIterator.next()); 
     int x2 = (int) Float.parseFloat(wordListIterator.next()); 
     int y2 = (int) Float.parseFloat(wordListIterator.next()); 
     wordBounds.add(new Rectangle(x1, y2, x2 - x1, y1 - y2)); // in page, not screen coordinates 
    } 
    return wordBounds; 
} 

然后确定了RectangleMouseEvent内下跌:

/** 
* Finds the bounding Rectangle of a word located at a Point. 
* 
* @param p Point to find word bounds 
* @param wordBounds List of word boundaries to search 
* @return A Rectangle that bounds a word and contains a point, or null if 
*   there is no word located at the point 
*/ 
private Rectangle findWordBoundsAtPoint(Point p, List<Rectangle> wordBounds) { 
    Rectangle result = null; 
    for (Rectangle wordBound : wordBounds) { 
     if (wordBound.contains(p)) { 
      result = wordBound; 
      break; 
     } 
    } 
    return result; 
} 

出于某种原因,只是将此矩形传递给突出显示方法不起作用。一些修修补补后,我发现,在每一侧的一个点缩小Rectangle解决了这个问题:

/** 
* Contracts a Rectangle to enable it to be highlighted. 
* 
* @return A contracted Highlight Rectangle 
*/ 
private Rectangle contractHighlight(Rectangle highlight){ 
    int x = highlight.x + 1; 
    int y = highlight.y + 1; 
    int width = highlight.width -2; 
    int height = highlight.height - 2; 
    return new Rectangle(x, y, width, height); 
} 

然后,我只是把它传递给此方法增添亮点:

/** 
* Highlights text on the document 
*/ 
private void highlightText(Rectangle highlightRectangle) { 
    pdfDecoder.getTextLines().addHighlights(new Rectangle[]{highlightRectangle}, false, currentPage); 
    pdfDecoder.repaint(); 
} 

最后,所有的以上电话都打包成这种方便的方法:

/** 
* Highlights the word at the given point. 
* 
* @param p Point where word is located 
*/ 
private void highlightWordAtPoint(Point p) { 
    try { 
     Rectangle wordBounds = findWordBoundsAtPoint(p, parseWordBounds(scanForWords(p))); 
     if (wordBounds != null) { 
      highlightText(contractHighlight(wordBounds)); 
     } 
    } catch (PdfException e) { 
     // TODO Auto-generated catch block 
     e.printStackTrace(); 
    } 
}