2011-04-14 49 views
0

是否可以使用PDF :: API2拆分多文档PDF?例如,如果myfile.pdf包含以下书签:如何使用PDF :: API2基于书签拆分多文档PDF基于书签的PDF :: API2

  • bookmark1
  • bookmark2
  • bookmark3

然后它需要被分裂到以下各个PDF文件:

  • bookmark1.pdf
  • bookmark2.pdf
  • bookmark3.pdf

我找不到PDF :: API2的文档中的任何书签项。它是指什么提纲

谢谢!

+0

以供将来参考,Adobe公司表示书签作为PDF规范“大纲” – yms 2012-02-03 15:20:55

回答

3

我在Perl中尝试了一下,然后放弃并努力工作到pdftk。我仍然从Perl控制它。以下是一个示例脚本,其中书签的标题为“第1章”和“附录1”。您可能可以调整这个脚本,但意识到一些东西对我来说是特别的。我也是用一些新的功能,但如果你不想使用Perl 5.13,您可以轻松地切换出的部分:

use 5.013; 

use Data::Dumper; 
use File::Basename; 
use File::Spec::Functions; 
use File::Path qw(make_path); 

my $pdftk = 'pdftk'; 


    my $file = $ARGV[0]; 
    say ("\n$0 <FILENAME>") && exit 1 unless $file; 

my $dir = dirname($file) || '.'; 
my $output_dir = $ARGV[1] || $dir; 

unless(-e $output_dir) { 
    make_path $output_dir, { mode => 0755 } unless -e $output_dir; 
    die "mkdir failed: $!" unless -e $output_dir; 
    } 


my $string = `$pdftk @{[quotemeta($file)]} dump_data output -`; 

my($last_page) = $string =~ m/NumberOfPages: (\d+)/; 
say "last page is $last_page"; 

my $regex = qr/ 
    BookmarkTitle:  \s+ (?<title>.*?) \s+ 
    BookmarkLevel:  \s+ (?<level>\d+) \s+ 
    BookmarkPageNumber: \s+ (?<page>\d+) 
    /x; 

my @page_numbers; 
while($string =~ /$regex/g) { 
    next unless $+{level} == 1; 
    push @page_numbers, [ @+{ qw(title page) } ]; 
    } 

say "Last index is $#page_numbers"; 

# Chapter&#160;1.&#160;Introduction 
while(my($index, $elem) = each @page_numbers) { 
    last if $index == $#page_numbers; 
    $page_numbers[$index]->[0] =~ s/&#160;/ /g; 
    unshift @$elem, 
        $page_numbers[$index]->[0] =~ s/(?:Chapter|Appendix)\s+(\d+|[ABC]|).?\s+//g 

      ? 
     $1 
      : 
     'XX'; 
    last if $index == $#page_numbers; 

    push @$elem, $page_numbers[$index+1]->[-1] - 1;  
    } 
unshift @{ $page_numbers[-1] }, 'XX'; 
push @{ $page_numbers[-1] }, $last_page; 

print Dumper(\@page_numbers); 

# pdftk A=one.pdf B=two.pdf cat A1-7 B1-5 A8 output combined.pdf 
foreach my $elem (@page_numbers) { 
    my $chapter = $elem->[1] =~ s/\s+/_/rg; 
    my $filename = catfile($output_dir, "$elem->[0].$chapter.pdf"); 
    say "Splitting Chapter $elem->[0] $elem->[1]"; 
    print "Running ", join ' ', $pdftk, $file, 'cat', "$elem->[2]-$elem->[3]", 'output', $filename, "\n"; 
    system $pdftk, $file, 'cat', "$elem->[2]-$elem->[3]", 'output', $filename; 
    }