从PDF创建一个索引

可能重复：
How do I Index PDF files and search for keywords?从PDF创建一个索引

创建索引了PDF的。

2011-08-02 Flow Rocks

工作你有什么到目前为止？如果使用Python，请查看'collections'模块。 – TyrantWave

哦，看。很多人都问过同样的问题：http://stackoverflow.com/search?q=python+index+pdf。您也可以使用页面顶部的“搜索”框并查看其他人提出的问题，这可能会对您有所帮助。 –

“哪一个不是我正在寻找的东西”。没有任何帮助。请仔细并全面地定义您的要求实际上是不同的。我们不知道你在做什么独特或不同。它看起来与我们完全相同。 –

我认为你可以使用pyPdf Python库（http://pybrary.net/pyPdf/）。的网页此代码显示号码其中包括所需的字：

from pyPdf import PdfFileReader 

input = PdfFileReader(file("YourPDFFile.pdf", "rb")) 

numberOfPages = input.getNumPages() 

i = 1 
while i < numberOfPages: 
    oPage = input.getPage(i) 
    text = oPage.extractText() 
    text.encode('utf8', 'ignore') 
    if text.find('What are you looking for') != -1: 
     print i 
    i += 1

相同，但与Python 3

from pyPdf import PdfFileReader 

input = PdfFileReader(open("YourPDFFile.pdf", "rb")) 

numberOfPages = input.getNumPages() 

i = 1 
while i < numberOfPages: 
    oPage = input.getPage(i) 
    text = oPage.extractText() 
    text.encode('utf8', 'ignore') 
    if text.find('What are you looking for') != -1: 
     print(i) 
    i += 1

来源

2011-08-03 12:12:22

我认为主要的问题是我在这个脚本中使用了Python 2.7，并且'print'的构造有所不同Python版本[http://diveintopython3.org/porting-code-to-python-3-with-2to3.html](http://diveintopython3.org/porting-code-to-python-3-with-2to3。 html） –

正如一个注释，它是一个更直接的做它的for循环，'为我在范围内（1，numberOfPages）：'只是测试'如果'文字'在文字' – 2011-08-05 10:18:07

我没有使用PyPdf，但查看文档，它看起来不像你可以。我对PDF标准了解不多，但可能文档本身是按页面定义的。 – 2011-08-05 14:54:12

从PDF创建一个索引

回答

相关问题