创建后访问bytesIO对象

我正在使用slate（https://pypi.python.org/pypi/slate）试图在目录中提取多个pdf文本的scrapy蜘蛛。我没有兴趣将实际的PDF保存到磁盘，因此我建议查看https://docs.python.org/2/library/io.html#buffered-streams的io.bytesIO子类。基于Creating bytesIO object，我已经初始化了pdf体的bytesIO类，但现在我需要将数据传递给slate模块。到目前为止，我有：创建后访问bytesIO对象

def save_pdf(self, response): 
    in_memory_pdf = BytesIO(response.body) 

    with open(in_memory_pdf, 'rb') as f: 
     doc = slate.PDF(f) 
     print(doc[0])

我越来越：

in_memory_pdf.read(response.body) 
TypeError: integer argument expected, got 'str'

我怎样才能得到这个工作？

编辑：

with open(in_memory_pdf, 'rb') as f: 
TypeError: coercing to Unicode: need string or buffer, _io.BytesIO found

编辑2：

def save_pdf(self, response): 
    in_memory_pdf = BytesIO(bytes(response.body)) 
    in_memory_pdf.seek(0) 
    doc = slate.PDF(in_memory_pdf) 
    print(doc)

来源

2016-09-30 user61629

尝试'in_memory_pdf = BytesIO（bytes（response.body））'。 – martineau

谢谢，这解决了最初的问题！ – user61629

尝试使用['StringIO']（https://docs.python.org/2/library/stringio.html#module-StringIO）而不是'BytesIO'。还要注意的是，对于任何一个，您都不需要'with open（...）as f'，只需在使用'in_memory_pdf.seek（0）'创建后将其倒回到开头，然后使用'in_memory_pdf'_instead_ of' F'。 – martineau

你已经知道答案了。在Python TypeError消息中明确提到并且从文档中明确提到：

class io.BytesIO([initial_bytes])

BytesIO接受字节。你传递的内容。即：response.body是一个字符串。

来源

2016-09-30 20:24:04

谢谢你，非常有用的信息！我从上面使用了in_memory_pdf = BytesIO（bytes（response.body））。现在的问题是我上面添加了一个错误。 – user61629

您的解决方案几乎是由错误本身给出的。您正在传递字节串。如错误所示，与int进行协调。它应该工作，因为字节的工作。 –

创建后访问bytesIO对象

回答

相关问题