2013-06-28 100 views
2

我正在研究一种可以将PDF文件转换为图像的解决方案。 我使用从CodeProject下面的例子: http://www.codeproject.com/Articles/317700/Convert-a-PDF-into-a-series-of-images-using-Csharp?msg=4134859#xx4134859xx将PDF转换为图像批量

现在我用下面的代码试图从更多的则1000 PDF文件,新的图像生成:

using Cyotek.GhostScript; 
using Cyotek.GhostScript.PdfConversion; 
using System; 
using System.Collections.Generic; 
using System.Drawing; 
using System.IO; 
using System.Linq; 
using System.Text; 
using System.Threading.Tasks; 

namespace RefClass_PDF2Image 
{ 
    class Program 
    { 
     static void Main(string[] args) 
     { 
      string outputPath = Properties.Settings.Default.outputPath; 
      string pdfPath = Properties.Settings.Default.pdfPath; 

      if (!Directory.Exists(outputPath)) 
      { 
       Console.WriteLine("Der angegebene Pfad " + outputPath + " für den Export wurde nicht gefunden. Bitte ändern Sie den Pfad (outputPath) in der App.Config Datei."); 
       return; 
      } 
      else 
      { 
       Console.WriteLine("Output Pfad: " + outputPath + " gefunden."); 
      } 

      if (!Directory.Exists(pdfPath)) 
      { 
       Console.WriteLine("Der angegebene Pfad " + pdfPath + " zu den PDF Zeichnungen wurde nicht gefunden. Bitte ändern Sie den Pfad (pdfPath) in der App.Config Datei."); 
       return; 
      } 
      else 
      { 
       Console.WriteLine("PDF Pfad: " + pdfPath + " gefunden."); 
      } 


      Pdf2ImageSettings settings = GetPDFSettings(); 

      DateTime start = DateTime.Now; 
      TimeSpan span; 

      Console.WriteLine(""); 
      Console.WriteLine("Extraktion der PDF Zeichnungen wird gestartet: " + start.ToShortTimeString()); 
      Console.WriteLine(""); 

      DirectoryInfo diretoryInfo = new DirectoryInfo(pdfPath); 
      DirectoryInfo[] directories = diretoryInfo.GetDirectories(); 

      Console.WriteLine(""); 
      Console.WriteLine("Es wurden " + directories.Length + " verschiedende Verzeichnisse gefunden."); 
      Console.WriteLine(""); 

      List<string> filenamesPDF = Directory.GetFiles(pdfPath, "*.pdf*", SearchOption.AllDirectories).Select(x => Path.GetFullPath(x)).ToList(); 
      List<string> filenamesOutput = Directory.GetFiles(outputPath, "*.*", SearchOption.AllDirectories).Select(x => Path.GetFullPath(x)).ToList(); 

      Console.WriteLine(""); 
      Console.WriteLine("Es wurden " + filenamesPDF.Count + " verschiedende PDF Zeichnungen gefunden."); 
      Console.WriteLine(""); 

      List<string> newFileNames = new List<string>(); 
      int cutLength = pdfPath.Length; 


      for (int i = 0; i < filenamesPDF.Count; i++) 
      { 
       string temp = filenamesPDF[i].Remove(0, cutLength); 
       temp = outputPath + temp; 
       temp = temp.Replace("pdf", "jpg"); 
       newFileNames.Add(temp); 
      } 

      for (int i = 0; i < filenamesPDF.Count; i++) 
      { 
       FileInfo fi = new FileInfo(newFileNames[i]); 
       if (!fi.Exists) 
       { 
        if (!Directory.Exists(fi.DirectoryName)) 
        { 
         Directory.CreateDirectory(fi.DirectoryName); 
        } 

        Bitmap firstPage = new Pdf2Image(filenamesPDF[i], settings).GetImage(); 
        firstPage.Save(newFileNames[i], System.Drawing.Imaging.ImageFormat.Jpeg); 
        firstPage.Dispose(); 
       } 

       //if (i % 20 == 0) 
       //{ 
       // GC.Collect(); 
       // GC.WaitForPendingFinalizers(); 
       //} 
      } 


      Console.ReadLine(); 
     } 

     private static Pdf2ImageSettings GetPDFSettings() 
     { 
      Pdf2ImageSettings settings; 
      settings = new Pdf2ImageSettings(); 
      settings.AntiAliasMode = AntiAliasMode.Medium; 
      settings.Dpi = 150; 
      settings.GridFitMode = GridFitMode.Topological; 
      settings.ImageFormat = ImageFormat.Png24; 
      settings.TrimMode = PdfTrimMode.CropBox; 
      return settings; 
     } 
    } 
} 

不幸的是,我总是在Pdf2Image得到。 cs内存不足异常。这里的代码:

public Bitmap GetImage(int pageNumber) 
{ 
    Bitmap result; 
    string workFile; 

    //if (pageNumber < 1 || pageNumber > this.PageCount) 
    // throw new ArgumentException("Page number is out of bounds", "pageNumber"); 

    if (pageNumber < 1) 
     throw new ArgumentException("Page number is out of bounds", "pageNumber"); 

    workFile = Path.GetTempFileName(); 

    try 
    { 
    this.ConvertPdfPageToImage(workFile, pageNumber); 
    using (FileStream stream = new FileStream(workFile, FileMode.Open, FileAccess.Read)) 
    { 
     result = new Bitmap(stream); // --->>> here is the out of memory exception 
     stream.Close(); 
     stream.Dispose(); 
    } 

    } 
    finally 
    { 
    File.Delete(workFile); 
    } 

    return result; 
} 

我该如何解决,以避免这种异常?

感谢您的帮助, 卓

+0

配置位图 – Sayse

+0

是的,这就是我所做的:firstPage.Dispose(); – tro

回答

2

不知道这是否值得您购买,但看起来您可以在没有中间位图的情况下执行所需操作。 PdfToImage中有这样的代码:

public void ConvertPdfPageToImage(string outputFileName, int pageNumber) 
{ 
    if (pageNumber < 1 || pageNumber > this.PageCount) 
    throw new ArgumentException("Page number is out of bounds", "pageNumber"); 

    using (GhostScriptAPI api = new GhostScriptAPI()) 
    api.Execute(this.GetConversionArguments(this._pdfFileName, outputFileName, pageNumber, this.PdfPassword, this.Settings)); 
} 

它为你在你想要的地方写一个文件。为什么不直接调用该方法而不是直接读取图像并将其写回?

+0

太棒了!那正是我要找的! – tro

2

这可能不是直接回答你的问题,但仍然是有用的:ImageMagick的提供在批处理模式下创建的PDF文件的图像

单一的PDF文件的简单方法许多慢跑:

convert -geometry 1024x768 -density 200 -colorspace RGB test.pdf +adjoin test_%0d.jpg 

,或者如果你要处理许多PDF文件:

mogrify -format jpg -alpha off -density 150 -quality 80 -resize 768 -unsharp 1.5 *.pdf 

(设置应该明显地适应您的需求:))

为此C#编程,你可以使用.NET ImageMagick的包装 http://imagemagick.codeplex.com

+0

不完全是我要找的,谢谢 – tro

2

添加使用为您造成的位图

using (FileStream stream = new FileStream(workFile, FileMode.Open, FileAccess.Read)) 
using (Bitmap result = new Bitmap(stream)) 
{ 
... 
} 
+0

这个解决方案看起来比仅仅处理更优雅。 – Malhotra

+0

在我的情况下不起作用:-( – tro

+0

**使用**将封装的块封装在try/finally中,在finally块中调用** Dispose **,这确保** Dispose **将被调用,即使一个异常发生。[链接](http://stackoverflow.com/questions/10984336/net-using-using-blocks-vs-calling-dispose) – Adam