从pdf文件中读取超链接

我正在尝试读取pdf文件并从此文件中获取所有超链接。我正在使用iTextSharp for C＃.net。从pdf文件中读取超链接

PdfReader reader = new PdfReader("test.pdf");   
List<PdfAnnotation.PdfImportedLink> list = reader.GetLinks(36);

这种方法“GetLinks”返回一个列表有很多关于链接的信息，但这种方法并不能返回我想要的值，超链接字符串，我确切地知道，在第36页的超链接

来源

2011-08-05 levi

PdfReader.GetLinks()只是为了与内部的文件，而不是外部的超链接的链接一起使用。为什么？我不知道。

以下代码基于code I wrote earlier，但我已将它限制为存储在PDF中的链接，作为PdfName.URI。它可能存储链接为Javascript，最终做同样的事情，可能还有其他类型，但你需要检测。我不相信规范中有任何内容表明链接实际上需要是一个URI，它只是暗示，因此下面的代码返回一个字符串，您可以（可能）自己将其转换为URI。

private static List<string> GetPdfLinks(string file, int page) 
    { 
     //Open our reader 
     PdfReader R = new PdfReader(file); 

     //Get the current page 
     PdfDictionary PageDictionary = R.GetPageN(page); 

     //Get all of the annotations for the current page 
     PdfArray Annots = PageDictionary.GetAsArray(PdfName.ANNOTS); 

     //Make sure we have something 
     if ((Annots == null) || (Annots.Length == 0)) 
      return null; 

     List<string> Ret = new List<string>(); 

     //Loop through each annotation 
     foreach (PdfObject A in Annots.ArrayList) 
     { 
      //Convert the itext-specific object as a generic PDF object 
      PdfDictionary AnnotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(A); 

      //Make sure this annotation has a link 
      if (!AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK)) 
       continue; 

      //Make sure this annotation has an ACTION 
      if (AnnotationDictionary.Get(PdfName.A) == null) 
       continue; 

      //Get the ACTION for the current annotation 
      PdfDictionary AnnotationAction = (PdfDictionary)AnnotationDictionary.Get(PdfName.A); 

      //Test if it is a URI action (There are tons of other types of actions, some of which might mimic URI, such as JavaScript, but those need to be handled seperately) 
      if (AnnotationAction.Get(PdfName.S).Equals(PdfName.URI)) 
      { 
       PdfString Destination = AnnotationAction.GetAsString(PdfName.URI); 
       if (Destination != null) 
        Ret.Add(Destination.ToString()); 
      } 
     } 

     return Ret; 

    }

，并称之为：

 string myfile = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Output.pdf"); 
     List<string> Links = GetPdfLinks(myfile, 1);

来源

2011-08-05 16:27:12

这很好。万分感谢！ – Keplah

克里斯：你上面的代码几乎和我的完全一样，并且大部分时间似乎正常工作。我遇到的问题是当我试图获得'PdfName.ANNOTS'时，有时候我得到一个'null'值，当我清楚地看到文档中有超链接时。有什么想法吗？谢谢。 –

我告诉你要做的第一件事就是在Acrobat Pro中打开PDF（如果有的话），对其执行Preflight，转到选项和浏览内部PDF结构，看看是否有任何Annot那里。我告诉你的另一件事是确保你从一个开始计算页码而不是零，我犯了很多次这个错误。如果这没有帮助，并且文件不是保密的，您可以通过电子邮件发送给我，我的地址位于我的个人资料中。 –

我注意到，在一个PDF，看起来像一个URL的任何文本可以模拟由PDF vewer一个注释链接。在Adobe Acrobat中，在一个名为“创建URL链接”的常规选项卡下有一个页面显示首选项，该选项控制着这一点。我正在编写代码来删除URL链接注释，只是发现没有。但是，Acrobat却自动将看起来像一个URL的文本转换成一个看起来像是注释链接的文本。

来源

2012-10-25 17:12:07

从pdf文件中读取超链接

回答

相关问题