在RDLC中包含图像会导致意想不到的大输出大小

我有一个RDLC报告，其中包含一些数据和（可选）图像。内容呈现为PDF。在RDLC中包含图像会导致意想不到的大输出大小

我可能有一个容器（包）文件，其中存储了100个相同的结果。问题是，如果我包含图像，结果输出的数量会比预期的数量增加。

作为一个例子;我的RDLC报表是一张发票，可以在底部显示签名图片的图像。我可能在一个客户的包文件中有100个发票。

如果没有图像的总输出包（100张发票）的大小是2MB，并且图像是15 KB，那么预计图像的总输出包将在3.5MB（2MB + 15KB * 100）。问题是我得到的总输出包超过8MB。

是否有可用于减轻这种输出的大小，或其他任何方式去获得的输出大小与预期更一致的任何技术？

2012-03-02 StingyJack

不知道rdlc是什么。但我认为，以PDF格式呈现时，15KB图像不一定必须是15KB。这是因为为web制作的典型图像的分辨率为72dpi。当包含在PDF中时，软件通常会将其转换为200-300dpi以获得最佳的打印质量。一张100x100像素的图像因此在200dpi时变成〜278x278px的图像; 10,000px图像被转换为77,000px，你做数学。 – 2012-03-05 14:39:32

由于没有添加新信息，PDF渲染器保存上采样图像将是愚蠢的。上采样可以等到打印时间。但大量的软件确实愚蠢的事情... – japreiss 2012-03-05 14:46:22

你能告诉我你的图像类型（JPG，PNG，TIF），它的颜色深度（1bpp，8bpp，24bpp等）及其大小（宽度和高度像素）？ – iPDFdev 2012-03-05 15:17:21

根据您的PDF生成器的功能，图像可以保存为弱，无损或甚至不压缩。您可以使用以下方法从PDF中提取图像信息，以检查这是否属于您的情况。如果是这样，可以使用一些“PDF compression”软件来解决这个问题。

（这可能看起来很奇怪，但我真的没有发现任何预先编写的软件可以做到这一点）

安装Python 2.x和PDFMiner包（参见PDFMiner manual#cmap安装步骤），然后使用以下代码列出文档中的所有图像，它们的大小和压缩。有关PDF使用的压缩算法列表和说明，请参阅PDF specification，第23页（“标准过滤器”表）。

from pdfminer.pdfparser import PDFParser, PDFDocument 
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter 
from pdfminer.pdfdevice import PDFDevice 

# Open a PDF file. 
fp = open('Reader.pdf', 'rb') 
# Create a PDF parser object associated with the file object. 
parser = PDFParser(fp) 
# Create a PDF document object that stores the document structure. 
doc = PDFDocument() 
# Connect the parser and document objects. 
parser.set_document(doc) 
doc.set_parser(parser) 
# Supply the password for initialization. 
# (If no password is set, give an empty string.) 
doc.initialize('') 
# Check if the document allows text extraction. If not, abort. 
if not doc.is_extractable: 
    raise PDFTextExtractionNotAllowed 
# Create a PDF resource manager object that stores shared resources. 
rsrcmgr = PDFResourceManager() 

from pdfminer.layout import LAParams, LTImage 
from pdfminer.converter import PDFPageAggregator 

# Set parameters for analysis. 
laparams = LAParams() 
# Create a PDF page aggregator object. 
device = PDFPageAggregator(rsrcmgr, laparams=laparams) 
interpreter = PDFPageInterpreter(rsrcmgr, device) 

#Build layout trees of all pages 
layouts=[] 
for page in doc.get_pages(): 
    interpreter.process_page(page) 
    # receive the LTPage object for the page. 
    layouts.append(device.get_result()) 

#search the trees for images and show their info, 
# excluding repeating ones 
known_ids=set() 
count=0;size=0 
def lsimages(obj): 
    global count; global size 
    if hasattr(obj,'_objs'): 
     for so in obj._objs: 
      if isinstance(so,LTImage): 
       i=so; id=i.stream.attrs['ID'].objid 
       if id not in known_ids: 
        a=i.stream.attrs 
        print a 
        count+=1;size+=a.get('Length',0) 
        known_ids.add(id) 
      lsimages(so) 
for l in layouts: 
    lsimages(l) 
print "Total: %d images, %d bytes"%(count,size)

Credits：样板代码取自Programming with PDFMiner文章。

来源

2012-03-07 04:27:48

在RDLC中包含图像会导致意想不到的大输出大小

回答

相关问题