2015-09-30 36 views
3

我一直试图解压缩一些.bz2文件在线和逐行可以说,因为我正在处理的文件是大规模的未压缩(100 GB的地区未压缩),所以我想添加一个可以节省磁盘空间的解决方案。Boost 1.59不解压所有bzip2流

我没有问题使用压缩的文件与香草bzip2压缩文件,但与pbzip2压缩文件只解压缩它找到的第一个bz2流。这个错误跟踪器涉及到这个问题:https://svn.boost.org/trac/boost/ticket/3853但我导致相信它已经修复了1.41版本。我检查过bzip2.hpp文件,它包含'fixed'版本,我也检查过在程序中使用的Boost版本是1.59。

的代码是在这里:

cout<<"Warning bzip2 support is a little buggy!"<<endl; 

//Open the file here 
trans_file.open(files[i].c_str(), std::ios_base::in | std::ios_base::binary); 

//Set up boost bzip2 compression 
boost::iostreams::filtering_istream in; 
in.push(boost::iostreams::bzip2_decompressor()); 
in.push(trans_file); 
std::string str; 

//Begin reading 
while(std::getline(in, str)) 
{ 
    std::stringstream stream(str); 
    stream>>id_f>>id_i>>aif; 
    /* Do stuff with values here*/ 
} 

任何建议将是巨大的。谢谢!

+0

张贴在评论的补丁https://svn.boost.org/trac/boost/ticket/9749为我解决了这个问题。不幸的是,这个bug现在还是有一段时间了。 – PiQuer

回答

3

你说得对。

看来,变更集#63057只能解决问题的一部分。

但是,相应的单元测试确实有效。但它使用copy算法(同样在composite<>而不是filtering_istream,如果这是相关的)。

我会打开这个作为缺陷或回归。当然包括一个展示问题的文件。对我来说,它的复制只使用/etc/dictionaries-common/words压缩与pbzip2(默认选项)。

我有test.bz2这里:http://7f0d2fd2-af79-415c-ab60-033d3b494dc9.s3.amazonaws.com/test.bz2

这里是我的测试程序:

#include <boost/iostreams/filtering_stream.hpp> 
#include <boost/iostreams/filter/bzip2.hpp> 
#include <boost/iostreams/stream.hpp> 
#include <fstream> 
#include <iostream> 

namespace io = boost::iostreams; 

void multiple_member_test(); // from the unit tests in changeset #63057 

int main() { 
    //multiple_member_test(); 
    //return 0; 

    std::ifstream trans_file("test.bz2", std::ios::binary); 

    //Set up boost bzip2 compression 
    io::filtering_istream in; 
    in.push(io::bzip2_decompressor()); 
    in.push(trans_file); 

    //Begin reading 
    std::string str; 
    while(std::getline(in, str)) 
    { 
     std::cout << str << "\n"; 
    } 
} 

#include <boost/iostreams/compose.hpp> 
#include <boost/iostreams/copy.hpp> 
#include <boost/iostreams/device/array.hpp> 
#include <boost/iostreams/device/back_inserter.hpp> 
#include <cassert> 
#include <sstream> 

void multiple_member_test() // from the unit tests in changeset #63057 
{ 
    std::string  data(20ul << 20, '*'); 
    std::vector<char> temp, dest; 

    // Write compressed data to temp, twice in succession 
    io::filtering_ostream out; 
    out.push(io::bzip2_compressor()); 
    out.push(io::back_inserter(temp)); 
    io::copy(boost::make_iterator_range(data), out); 
    out.push(io::back_inserter(temp)); 
    io::copy(boost::make_iterator_range(data), out); 

    // Read compressed data from temp into dest 
    io::filtering_istream in; 
    in.push(io::bzip2_decompressor()); 
    in.push(io::array_source(&temp[0], temp.size())); 
    io::copy(in, io::back_inserter(dest)); 

    // Check that dest consists of two copies of data 
    assert(data.size() * 2 == dest.size()); 
    assert(std::equal(data.begin(), data.end(), dest.begin())); 
    assert(std::equal(data.begin(), data.end(), dest.begin() + dest.size()/2)); 

    dest.clear(); 
    io::copy( 
      io::array_source(&temp[0], temp.size()), 
      io::compose(io::bzip2_decompressor(), io::back_inserter(dest))); 

    // Check that dest consists of two copies of data 
    assert(data.size() * 2 == dest.size()); 
    assert(std::equal(data.begin(), data.end(), dest.begin())); 
    assert(std::equal(data.begin(), data.end(), dest.begin() + dest.size()/2)); 
}