0
我试图从嵌入在网页中的PDF中提取文本。我尝试使用PDF阅读器的宝石,但我得到一个解析错误。我无法从嵌入式PDF中提取数据(Ruby)
`find_first_xref_offset': PDF does not contain EOF marker (PDF::Reader::MalformedPDFError)
from /opt/boxen/rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/pdf-reader-1.3.3/lib/pdf/reader/xref.rb:99:in `load_offsets'
from /opt/boxen/rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/pdf-reader-1.3.3/lib/pdf/reader/xref.rb:60:in `initialize'
from /opt/boxen/rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/pdf-reader-1.3.3/lib/pdf/reader/object_hash.rb:44:in `new'
from /opt/boxen/rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/pdf-reader-1.3.3/lib/pdf/reader/object_hash.rb:44:in `initialize'
from /opt/boxen/rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/pdf-reader-1.3.3/lib/pdf/reader.rb:117:in `new'
from /opt/boxen/rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/pdf-reader-1.3.3/lib/pdf/reader.rb:117:in `initialize'
from role.rb:5:in `new'
from role.rb:5:in `<main>'
任何人都知道我是如何解决这个问题? 为此有更好的宝石?
谢谢
我仍然有同样的问题。我试图直接访问文件通过网址,并下载PDF在本地阅读。 [这是档案](http://www.tesoreria.cl/portal/portlets/imprimirAR/printAR.do?rutrol=32807514010&t=C&formulario=30&folio=3287514413&vcto=2013-11-30) – felipecamposclarke