将字节转换为UTF8编码的字符串安全吗？

今天我看到这样的代码问题：将字节转换为UTF8编码的字符串安全吗？

var accumulator = ""; 
var buffer = new byte[8192]; 
while (true) 
{ 
    var readed = stream.Read(buffer, 0, buffer.Length); 
    accumulator += Encoding.UTF8.GetString(buffer, 0, readed); 
    if (readed < buffer.Length) 
     break; 
} 
var result = Encoding.UTF8.GetBytes(accumulator);

我知道这个代码是低效的，但确实它的安全？是否有一些字节序列会影响结果？

来源

2017-07-23 Aleks Andreev

任何将代码点分割为8192字节的边界将失败，是的。为什么要以UTF-8解码才能立即重新编码？ – Ryan

不，它不安全。更好的方法是'accumulator = new StreamReader（stream，Encoding.UTF8）.ReadToEnd（）' –

这段代码显然是坏的;如果这是建议作为答案，那么你应该提请作者注意这个错误。

UTF-8序列显然可以多于一个字节。如果有一个多字节序列从当前缓冲区的末尾开始并在下一个缓冲区的开始处重新开始，那么每个缓冲区转换为一个字符串将是错误的。

来源

2017-07-23 20:25:20

“被建议作为答案” - 不，这个代码来自问题。从你的回答中，我明白了这种方法可能存在的一个错误谢谢 –

要做到这一点的安全方法是使用有状态的UTF8解码器，该解码器可以从Encoding.UTF8.GetDecoder()获得。

有状态解码器将在内部保存对应于不完整的多字节序列的字节。下次给它更多的字节时，它将完成序列并返回从序列中解码出的字符。

下面是如何使用它的一个例子。在我的实现中，我使用了一个char[]缓冲区，其大小足以保证我们有足够的空间来存储X字节的完整转换。这样，我们只执行两次内存分配来读取整个流。

public static string ReadStringFromStream(Stream stream) 
{ 
    // --- Byte-oriented state --- 
    // A nice big buffer for us to use to read from the stream. 
    byte[] byteBuffer = new byte[8192]; 

    // --- Char-oriented state --- 
    // Gets a stateful UTF8 decoder that holds onto unused bytes when multi-byte sequences 
    // are split across multiple byte buffers. 
    var decoder = Encoding.UTF8.GetDecoder(); 

    // Initialize a char buffer, and make it large enough that it will be able to fit 
    // a full reads-worth of data from the byte buffer without needing to be resized. 
    char[] charBuffer = new char[Encoding.UTF8.GetMaxCharCount(byteBuffer.Length)]; 

    // --- Output --- 
    StringBuilder stringBuilder = new StringBuilder(); 

    // --- Working state --- 
    int bytesRead; 
    int charsConverted; 
    bool lastRead = false; 

    do 
    { 
     // Read a chunk of bytes from our stream. 
     bytesRead = stream.Read(byteBuffer, 0, byteBuffer.Length); 

     // If we read 0 bytes, we hit the end of stream. 
     // We're going to tell the converter to flush, and then we're going to stop. 
     lastRead = (bytesRead == 0); 

     // Convert the bytes into characters, flushing if this is our last conversion. 
     charsConverted = decoder.GetChars( 
      byteBuffer, 
      0, 
      bytesRead, 
      charBuffer, 
      0, 
      lastRead 
     ); 

     // Build up a string in a character buffer. 
     stringBuilder.Append(charBuffer, 0, charsConverted); 
    } 
    while(lastRead == false); 

    return stringBuilder.ToString(); 
}

来源

2017-07-23 20:48:45 antiduh

没有必要重新发明轮子（假设它正在工作），请参阅“LB”的评论 – EZI

@EZI - 当然，但这表明了如何自己做，因此，给你一些你可以适应你的情况如果你不想阅读，直到流的结束或有其他不同的要求。每隔一段时间拉一下窗帘一点也没有错。 – antiduh

将字节转换为UTF8编码的字符串安全吗？

回答

相关问题