2013-02-19 108 views
2

我遇到以下问题。我有一个PDF文件,里面附带一个XML文件作为注释。 不作为嵌入文件,但作为注释。现在,我尝试从以下链接的代码来阅读:使用iTextSharp阅读PDF文件附件注释

iTextSharp - how to open/read/extract a file attachment?

它适用于嵌入式文件,但不能是文件attachemts作为注解。

我谷歌从PDF提取注释,并找出以下链接: Reading PDF Annotations with iText

所以注释类型为“文件附件集注”

有人能显示一个工作的例子吗?

预先感谢任何帮助

回答

8

正如经常在有关的iText和iTextSharp的问题,先要看看keyword list on itextpdf.com。这里您可以找到File attachment, extract attachments引用来自iText in Action — 2nd Edition的两个Java样本:

的类似Webified iTextSharp Examples

KubrickDvds包含以下方法extractAttachments/ExtractAttachments提取文件附件注解:

爪哇:

/** 
* Extracts attachments from an existing PDF. 
* @param src the path to the existing PDF 
*/ 
public void extractAttachments(String src) throws IOException { 
    PdfReader reader = new PdfReader(src); 
    PdfArray array; 
    PdfDictionary annot; 
    PdfDictionary fs; 
    PdfDictionary refs; 
    for (int i = 1; i <= reader.getNumberOfPages(); i++) { 
     array = reader.getPageN(i).getAsArray(PdfName.ANNOTS); 
     if (array == null) continue; 
     for (int j = 0; j < array.size(); j++) { 
      annot = array.getAsDict(j); 
      if (PdfName.FILEATTACHMENT.equals(annot.getAsName(PdfName.SUBTYPE))) { 
       fs = annot.getAsDict(PdfName.FS); 
       refs = fs.getAsDict(PdfName.EF); 
       for (PdfName name : refs.getKeys()) { 
        FileOutputStream fos 
         = new FileOutputStream(String.format(PATH, fs.getAsString(name).toString())); 
        fos.write(PdfReader.getStreamBytes((PRStream)refs.getAsStream(name))); 
        fos.flush(); 
        fos.close(); 
       } 
      } 
     } 
    } 
    reader.close(); 
} 

C#:

/** 
* Extracts attachments from an existing PDF. 
* @param src the path to the existing PDF 
* @param zip the ZipFile object to add the extracted images 
*/ 
public void ExtractAttachments(byte[] src, ZipFile zip) { 
    PdfReader reader = new PdfReader(src); 
    for (int i = 1; i <= reader.NumberOfPages; i++) { 
    PdfArray array = reader.GetPageN(i).GetAsArray(PdfName.ANNOTS); 
    if (array == null) continue; 
    for (int j = 0; j < array.Size; j++) { 
     PdfDictionary annot = array.GetAsDict(j); 
     if (PdfName.FILEATTACHMENT.Equals(
      annot.GetAsName(PdfName.SUBTYPE))) 
     { 
     PdfDictionary fs = annot.GetAsDict(PdfName.FS); 
     PdfDictionary refs = fs.GetAsDict(PdfName.EF); 
     foreach (PdfName name in refs.Keys) { 
      zip.AddEntry(
      fs.GetAsString(name).ToString(), 
      PdfReader.GetStreamBytes((PRStream)refs.GetAsStream(name)) 
     ); 
     } 
     } 
    } 
    } 
} 

KubrickDocumentary包含以下方法extractDocLevelAttachments/ExtractDocLevelAttachments提取文档级别的附件:

爪哇:

/** 
* Extracts document level attachments 
* @param filename  a file from which document level attachments will be extracted 
* @throws IOException 
*/ 
public void extractDocLevelAttachments(String filename) throws IOException { 
    PdfReader reader = new PdfReader(filename); 
    PdfDictionary root = reader.getCatalog(); 
    PdfDictionary documentnames = root.getAsDict(PdfName.NAMES); 
    PdfDictionary embeddedfiles = documentnames.getAsDict(PdfName.EMBEDDEDFILES); 
    PdfArray filespecs = embeddedfiles.getAsArray(PdfName.NAMES); 
    PdfDictionary filespec; 
    PdfDictionary refs; 
    FileOutputStream fos; 
    PRStream stream; 
    for (int i = 0; i < filespecs.size();) { 
     filespecs.getAsString(i++); 
     filespec = filespecs.getAsDict(i++); 
     refs = filespec.getAsDict(PdfName.EF); 
     for (PdfName key : refs.getKeys()) { 
     fos = new FileOutputStream(String.format(PATH, filespec.getAsString(key).toString())); 
     stream = (PRStream) PdfReader.getPdfObject(refs.getAsIndirectObject(key)); 
     fos.write(PdfReader.getStreamBytes(stream)); 
     fos.flush(); 
     fos.close(); 
     } 
    } 
    reader.close(); 
} 

C#:

/** 
* Extracts document level attachments 
* @param PDF from which document level attachments will be extracted 
* @param zip the ZipFile object to add the extracted images 
*/ 
public void ExtractDocLevelAttachments(byte[] pdf, ZipFile zip) { 
    PdfReader reader = new PdfReader(pdf); 
    PdfDictionary root = reader.Catalog; 
    PdfDictionary documentnames = root.GetAsDict(PdfName.NAMES); 
    PdfDictionary embeddedfiles = 
     documentnames.GetAsDict(PdfName.EMBEDDEDFILES); 
    PdfArray filespecs = embeddedfiles.GetAsArray(PdfName.NAMES); 
    for (int i = 0; i < filespecs.Size;) { 
    filespecs.GetAsString(i++); 
    PdfDictionary filespec = filespecs.GetAsDict(i++); 
    PdfDictionary refs = filespec.GetAsDict(PdfName.EF); 
    foreach (PdfName key in refs.Keys) { 
     PRStream stream = (PRStream) PdfReader.GetPdfObject(
     refs.GetAsIndirectObject(key) 
    ); 
     zip.AddEntry(
     filespec.GetAsString(key).ToString(), 
     PdfReader.GetStreamBytes(stream) 
    ); 
    } 
    } 
} 

(出于某种原因,C#示例把提取的文件在一些ZIP文件,而版本的Java把它们放到文件系统......哦也...)

+0

确定。谢谢。它完美的作品。 ExtractAttachments函数是我需要的。 – 2013-02-19 21:24:54