使用itextsharp根据大小将pdf拆分为更小的pdf

因此，我们有一些非常低效的代码，它会根据允许的最大大小将pdf拆分为更小的块。阿卡。如果最大大小为10megs，则会跳过8 meg文件，而基于页数将分割16 meg文件。使用itextsharp根据大小将pdf拆分为更小的pdf

这是我继承的代码，觉得必须有更高效的方法才能做到这一点，只需要一个方法和更少的实例化对象。

我们用下面的代码来调用的方法：

 List<int> splitPoints = null; 
     List<byte[]> documents = null; 

     splitPoints = this.GetPDFSplitPoints(currentDocument, maxSize); 
     documents = this.SplitPDF(currentDocument, maxSize, splitPoints);

方法：

private List<int> GetPDFSplitPoints(IClaimDocument currentDocument, int maxSize) 
    { 
     List<int> splitPoints = new List<int>(); 
     PdfReader reader = null; 
     Document document = null; 
     int pagesRemaining = currentDocument.Pages; 

     while (pagesRemaining > 0) 
     { 
      reader = new PdfReader(currentDocument.Data); 
      document = new Document(reader.GetPageSizeWithRotation(1)); 

      using (MemoryStream ms = new MemoryStream()) 
      { 
       PdfCopy copy = new PdfCopy(document, ms); 
       PdfImportedPage page = null; 

       document.Open(); 

       //Add pages until we run out from the original 
       for (int i = 0; i < currentDocument.Pages; i++) 
       { 
        int currentPage = currentDocument.Pages - (pagesRemaining - 1); 

        if (pagesRemaining == 0) 
        { 
         //The whole document has bee traversed 
         break; 
        } 

        page = copy.GetImportedPage(reader, currentPage); 
        copy.AddPage(page); 

        //If the current collection of pages exceeds the maximum size, we save off the index and start again 
        if (copy.CurrentDocumentSize > maxSize) 
        { 
         if (i == 0) 
         { 
          //One page is greater than the maximum size 
          throw new Exception("one page is greater than the maximum size and cannot be processed"); 
         } 

         //We have gone one page too far, save this split index 
         splitPoints.Add(currentDocument.Pages - (pagesRemaining - 1)); 
         break; 
        } 
        else 
        { 
         pagesRemaining--; 
        } 
       } 

       page = null; 

       document.Close(); 
       document.Dispose(); 
       copy.Close(); 
       copy.Dispose(); 
       copy = null; 
      } 
     } 

     if (reader != null) 
     { 
      reader.Close(); 
      reader = null; 
     } 

     document = null; 

     return splitPoints; 
    } 

    private List<byte[]> SplitPDF(IClaimDocument currentDocument, int maxSize, List<int> splitPoints) 
    { 
     var documents = new List<byte[]>(); 
     PdfReader reader = null; 
     Document document = null; 
     MemoryStream fs = null; 
     int pagesRemaining = currentDocument.Pages; 

     while (pagesRemaining > 0) 
     { 
      reader = new PdfReader(currentDocument.Data); 
      document = new Document(reader.GetPageSizeWithRotation(1)); 

      fs = new MemoryStream(); 
      PdfCopy copy = new PdfCopy(document, fs); 
      PdfImportedPage page = null; 

      document.Open(); 

      //Add pages until we run out from the original 
      for (int i = 0; i <= currentDocument.Pages; i++) 
      { 
       int currentPage = currentDocument.Pages - (pagesRemaining - 1); 
       if (pagesRemaining == 0) 
       { 
        //We have traversed all pages 
        //The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document 
        fs.Flush(); 
        copy.Close(); 
        documents.Add(fs.ToArray()); 
        document.Close(); 
        fs.Dispose(); 
        break; 
       } 

       page = copy.GetImportedPage(reader, currentPage); 
       copy.AddPage(page); 
       pagesRemaining--; 

       if (splitPoints.Contains(currentPage + 1) == true) 
       { 
        //Need to start a new document 
        //The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document 
        fs.Flush(); 
        copy.Close(); 
        documents.Add(fs.ToArray()); 
        document.Close(); 
        fs.Dispose(); 
        break; 
       } 
      } 

      copy = null; 
      page = null; 

      fs.Dispose(); 
     } 

     if (reader != null) 
     { 
      reader.Close(); 
      reader = null; 
     } 

     if (document != null) 
     { 
      document.Close(); 
      document.Dispose(); 
      document = null; 
     } 

     if (fs != null) 
     { 
      fs.Close(); 
      fs.Dispose(); 
      fs = null; 
     } 

     return documents; 
    }

据我所知道的，唯一的代码，在网上，我可以看到的是VB和没有按”不一定解决尺寸问题。

UPDATE：

我们遇到OutOfMemory异常，我相信这是一个与大对象堆的问题。所以有一个想法是减少代码占用量，这可能会减少堆上的大型对象的数量。

基本上，这是循环的一部分，它会经历任意数量的PDF，然后将它们拆分并将它们存储在数据库中。现在，我们不得不一次性改变方法（最后一次运行97个不同大小的pdf），每5分钟运行5个pdf文件。这并不理想，当我们向更多的客户提供这种工具时，这种情况不会很好地扩展。

（我们正在处理50 - 100兆pdf的，但他们可能会更大）。

来源

2012-01-26 Cyfer13

恕我直言，如果这项工作，让它一个人。我不认为*是一种很好的分割PDF的方法，因为预测页面大小非常困难。页面可能很小，因为它有1000个字（相对较小），或者一个页面可能非常大，因为它嵌入了高分辨率图像。 – CodingGorilla

我们遇到OutofMemory异常，我认为这是大对象堆的问题。所以有一个想法是减少代码占用量，这可能会减少堆上的大型对象的数量。（我们正在处理50 - 100兆pdf的，但他们可能会更大）。 – Cyfer13

如果不是因为错误，我不会触及可用的代码。 – Cyfer13

我也继承了这个确切的代码，它似乎存在一个主要缺陷。在GetPDFSplitPoints方法中，它将检查复制页面的总大小与最大大小，以确定在哪个页面上分割文件。
在SplitPDF方法中，当它到达发生分割的页面时，确保该点处的MemoryStream低于允许的最大大小，并且再多一页将超出该限制。但是在执行document.Close();之后，MemoryStream中增加了更多内容（在我使用的一个PDF示例中，MemoryStream的Length从012 MB之前和之后的9 MB变为19 MB）。我的理解是，复制页面的所有必要资源都将添加到Close上。
我猜我必须完全重写这段代码，以确保在保持原始页面完整性的同时，不会超过最大尺寸。

来源

2012-10-13 20:15:07

使用itextsharp根据大小将pdf拆分为更小的pdf

回答

相关问题