2017-01-25 125 views
0

我有一个随时间增长的缓冲区(字符串),我需要通过有限输入大小(4096字节)的通道发送此缓冲区。通过这个通道的通信代价很高,这就是为什么发送压缩数据更好。缓冲区的增长通过不同大小的块发生。这些块不能分割或意义丢失。累积压缩的增长缓冲区(C++,zlib)

我实际上是在C++中使用zlib进行压缩时使用了abitrary buffer size的限制。达到此限制时,字符串会被压缩并发送通道。这可行,但它不是最优的,因为对于不丢失信息(信道输入限制为4096字节),限制相当低。

我的想法是使用zlib来构建一个不断增长的压缩缓冲区,其中包含不同大小的压缩块,并在达到通道输入限制之前停止进程。 zlib是否允许使用不同大小的压缩块,或者我需要另一种算法?

+0

真的不知道zlib,但看看LZMA,我认为它可以处理您的情况。 http://7-zip.org/sdk.html – antipattern

回答

0

我成功设计了一款压缩器,它可以通过有限的输入尺寸部分地通过通道发送不断增长的缓冲区。我把这个答案提供给所有在同一个问题上工作的人。 Thx给Mark Adler和 MSalters让我走向正确的道路。

class zStreamManager { 
    public: 
     zStreamManager(); 
     ~zStreamManager(); 
     void endStream(); 
     void addToStream(const void *inData, size_t inDataSize); 

    private: 
     // Size of base64 encoded is about 4*originalSize/3 + (3 to 6) 
     // so with maximum output size of 4096, 3050 max zipped out 
     // buffer will be fine 
     const size_t CHUNK_IN = 1024, CHUNK_OUT = 3050; 
     const std::string base64Chars = 
     "ABCDEFGHIJKLMNOPQRSTUVWXYZ" 
     "abcdefghijklmnopqrstuvwxyz" 
     "/"; 
     bool deallocated = true; 
     z_stream stream; 
     std::vector<uint8_t> outBuffer; 
     std::string base64Encode(std::vector<uint8_t> &str); 
}; 

zStreamManager::~zStreamManager() { 
    endStream(); 
} 

void zStreamManager::endStream() { 
    if(!deallocated) { 
     deallocated = true; 
     uint8_t tempBuffer[CHUNK_IN]; 
     int response = Z_OK; 
     unsigned int have; 

     while(response == Z_OK) { 
      if (stream.avail_out == 0) { 
       outBuffer.insert(outBuffer.end(), tempBuffer, tempBuffer + CHUNK_IN); 
       stream.next_out = tempBuffer; 
       stream.avail_out = CHUNK_IN; 
      } 
      response = deflate(&stream, Z_FINISH); 
     } 

     have = CHUNK_IN - stream.avail_out; 
     if(have) 
      outBuffer.insert(outBuffer.end(), tempBuffer, tempBuffer + have); 

     deflateEnd(&stream); 

     if(outBuffer.size()) 
      SEND << outBuffer << "$"; 
    } 
} 

void zStreamManager::addToStream(const void *inData, size_t inDataSize) { 
    if(deallocated) { 
     deallocated = false; 
     stream.zalloc = 0; 
     stream.zfree = 0; 
     stream.opaque = 0; 
     deflateInit(&stream, 9); 
    } 

    std::vector<uint8_t> tempBuffer(inDataSize); 
    unsigned int have; 

    stream.next_in = reinterpret_cast<uint8_t *>(const_cast<void*>(inData)); 
    stream.avail_in = inDataSize; 
    stream.next_out = &tempBuffer[0]; 
    stream.avail_out = inDataSize; 

    while (stream.avail_in != 0) { 
     deflate(&stream, Z_SYNC_FLUSH); 
     if (stream.avail_out == 0) { 
      outBuffer.insert(outBuffer.end(), tempBuffer.begin(), tempBuffer.begin() + inDataSize); 
      stream.next_out = &tempBuffer[0]; 
      stream.avail_out = inDataSize; 
     } 
    } 

    have = inDataSize - stream.avail_out; 
    if(have) 
     outBuffer.insert(outBuffer.end(), tempBuffer.begin(), tempBuffer.begin() + have); 

    while(outBuffer.size() >= CHUNK_OUT) { 
     std::vector<uint8_t> zipped; 

     zipped.insert(zipped.end(), outBuffer.begin(), outBuffer.begin() + CHUNK_OUT); 
     outBuffer.erase(outBuffer.begin(), outBuffer.begin() + CHUNK_OUT); 

     if(zipped.size()) 
      SEND << zipped << "|"; 
    } 
} 

std::string zStreamManager::base64Encode(std::vector<uint8_t> &str) { 
    /* ALTERED VERSION OF René Nyffenegger BASE64 CODE 
    Copyright (C) 2004-2008 René Nyffenegger 

    This source code is provided 'as-is', without any express or implied 
    warranty. In no event will the author be held liable for any damages 
    arising from the use of this software. 

    Permission is granted to anyone to use this software for any purpose, 
    including commercial applications, and to alter it and redistribute it 
    freely, subject to the following restrictions: 

    1. The origin of this source code must not be misrepresented; you must not 
     claim that you wrote the original source code. If you use this source code 
     in a product, an acknowledgment in the product documentation would be 
     appreciated but is not required. 

    2. Altered source versions must be plainly marked as such, and must not be 
     misrepresented as being the original source code. 

    3. This notice may not be removed or altered from any source distribution. 

    René Nyffenegger [email protected] 
    */ 
    unsigned char const* bytes_to_encode = &str[0]; 
    unsigned int in_len = str.size(); 
    std::string ret; 
    int i = 0, j = 0; 
    unsigned char char_array_3[3], char_array_4[4]; 

    while(in_len--) { 
    char_array_3[i++] = *(bytes_to_encode++); 
    if (i == 3) { 
     char_array_4[0] = (char_array_3[0] & 0xfc) >> 2; 
     char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4); 
     char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6); 
     char_array_4[3] = char_array_3[2] & 0x3f; 

     for(i = 0; (i <4) ; i++) 
     ret += base64Chars[char_array_4[i]]; 
     i = 0; 
    } 
    } 

    if(i) { 
    for(j = i; j < 3; j++) 
     char_array_3[j] = '\0'; 

    char_array_4[0] = (char_array_3[0] & 0xfc) >> 2; 
    char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4); 
    char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6); 
    char_array_4[3] = char_array_3[2] & 0x3f; 

    for(j = 0; (j < i + 1); j++) 
     ret += base64Chars[char_array_4[j]]; 

    while((i++ < 3)) 
     ret += '='; 
    } 

    return ret; 
} 

用例:

zStreamManager zm; 
string growingBuffer = ""; 
bool somethingToSend = true; 

while(somethingToSend) { 
    RECEIVE(&growingBuffer); 
    if(growingBuffer.size()) { 
    zm.addToStream(growingBuffer.c_str(), growingBuffer.size()); 
    growingBuffer.clear(); 
    } else { 
    somethingToSend = false; 
    } 
} 

zm.endStream(); 

随着RECEIVESEND,用于接收缓冲器,并通过信道发送它的方法。对于解压缩的每个部分都用'|'分隔字符和整个缓冲区的末尾用'$'分隔。每个部分必须进行base64解码,然后连接。最后,它可以像任何其他压缩数据一样被zlib解压缩。

1

最简单的解决方案是将带外数据包描述转换为带内格式。到目前为止,最简单的方法是当你的输入块不使用全部256个可能的字节时。例如。当值00不出现在块中时,它可以用于在压缩之前分离块。否则,你需要一个转义代码。

无论哪种方式,你压缩连续流与块分隔符。在接收端,您可以解压缩流,识别分隔符,然后重新组装块。

1

您可以简单地进行连续zlib压缩,每当生成4K压缩数据时就在您的通道上发送数据。另一方面,您需要确保解压器以正确的顺序输入4K块压缩数据。

zlib中的deflate算法是突发性的,在发送任何压缩数据之前在内部从16K到64K或更多的数据量进行累加,然后传送一个压缩数据块,然后再次累加。所以会有延迟,除非你要求放空冲洗数据。如果您想减少延迟,可以通过刷新来减小块数量,对压缩影响较小。

+0

好的。我对压缩块(zlib用法示例中的'''CHUNK'''变量)感到困惑,实际上它可以有一个固定的大小。我的理解是,我需要避免调用'''deflateEnd'''函数并将'''flush''放在''''true'''上。我需要做一些测试。 Thx为您的答案。 –