有一个非常罕见的方法来提取数据,但它只适用于老版本的ghostscript,如8.51或8.62。在旧版本的ghostscript中,PDF命令是在/lib/pdf_ops.ps中定义的。新版本还有其他一些功能。
版本8.62的测试版本可在此处获得。
http://sourceforge.net/projects/ghostscript/files/GPL%20Ghostscript/8.62/gs862w32.exe/download
你后面的文本是用/Tj {} def
和/TJ {} def
通过添加dup ==
每个定义的开始打印。 (这可能会更复杂)我也没有担心字体警告消息,但如果数据写入文件,这些会被过滤掉。
由于字距正在完成,因此有些字被分割成单独的字母。考虑到时间,这也可以被过滤。
改性/ TJ从pdf_ops.ps /TJ {DUP == 0 0通过MoveTo显示settextposition } bdef
改性从pdf_ops.ps
/TJ
/TJ { dup ==
0 0 moveto {
dup type /stringtype eq {
Show
} { -1000 div
currentfont /ScaleMatrix .knownget { 0 get mul } if
0 Vexch rmoveto
} ifelse
} forall settextposition
} bdef
输出
(Help a neighbor within your county each month by contributing to The Salvation)
(Army's Project SHARE and Georgia Power will match your gift. To help, simply check)
($1, $2, $5, or $10 on the return portion of this bill. Starting next month, your pledge)
(amount will be included on your monthly bill.)
(Our business offices will be closed on December 24 and 25 for Christmas and January)
(1 for New Year's Day. In case of an emergency, please call us at the number on your)
(bill 24 hours a day, 7 days a week.)
(PLEASE KEEP THIS PORTION FOR YOUR RECORDS.)
(PLEASE RETURN THIS PORTION WITH YOUR PAYMENT, MAKING SURE THE RETURN ADDRESS SHOWS IN THE ENVELOPE WINDOW.)
(Account Number)
(Mail To:)
不是后记的乐趣吗?
你好,你有没有试过从PDF中删除图像,以便PDF只包含文本?我正在寻找一种方法来做到这一点。你有使用ghostScript或其他cli工具的解决方案吗?请帮助。 – codin 2013-12-19 09:55:21