UTF-8的字符串构造函数是否被破坏？

我有以下代码从缓冲区加载一个空终止的多字节字符串。它名义上将数据解释为UTF-8，但如果该转换失败，则将数据解释为ISO-8859-1。这里是代码：UTF-8的字符串构造函数是否被破坏？

@Override 
    public String format(String date_format, boolean use_locale, int precision) 
    { 
     String rtn = null; 
     int len = 0; 
     for(int i = 0; i < max_len; ++i) 
     { 
     if(storage[storage_offset + i] != 0) 
      ++len; 
     else 
      break; 
     } 
     try 
     { 
     rtn = new String(storage, storage_offset, len, "UTF-8"); 
     } 
     catch(UnsupportedEncodingException e1) 
     { 
     try 
     { 
      rtn = new String(storage, storage_offset, len, "ISO-8859-1"); 
     } 
     catch(UnsupportedEncodingException e2) 
     { } 
     } 
     return rtn; 
    }

我的意图是，如果字符串解码失败的UTF-8，我们可以回落。这取决于抛出的UnsupportedEncodingException。我已经运行了这个代码的测试，它传递扩展字符（代码大于128），没有预期的UTF-8模式。我发现的是，异常不会被抛出，并且转换的字符串显示未知的字形。我的问题是标准库实现是否有任何更改会导致异常不被抛出？

来源

2015-05-21 Jon Trauntvein

请提供MCVE。 –

如果字符集本身不受支持（即指定字符集并且系统无法识别名称），则会抛出UnsupportedEncodingException - 如果字节编码不正确，则不会抛出UnsupportedEncodingException。请注意，需要java.nio.charset.Charset的相应构造函数而不是会抛出该异常（因为没有映射到Charset的名称，因此不存在映射不存在的可能性）。

为String(byte[], int, int, String)的文档中指定的行为（即，它的不确定:)），并建议修复：

此构造时给出的字节是不是在给定的charset有效的行为是不确定的。当需要对解码过程进行更多的控制时，应该使用CharsetDecoder类。

来源

2015-05-21 23:16:15 yshavit

根据that String constructor的文档，只有在指定的charsetName未知的情况下才抛出UnsupportedEncodingException。

当给定字节在给定字符集中无效时，此构造函数的行为未指定。当需要对解码过程进行更多控制时，应该使用CharsetDecoder类。

来源

2015-05-21 23:16:27 Buddy

您可以测试charset是否可用。
要获取可用字符集的使用：

SortedMap<String, Charset> availableCharsets = Charset.availableCharsets(); 
    for (Map.Entry<String, Charset> entrySet : availableCharsets.entrySet()) { 
     String key = entrySet.getKey(); 
     Charset value = entrySet.getValue(); 
     System.out.println("key: " + key + " value: " + value.name()); 
    } 
    System.out.println("The default Charset is: " + Charset.defaultCharset().name());

来源

2015-05-21 23:17:42 Andie2302

UTF-8的字符串构造函数是否被破坏？

回答

相关问题