字符串文字basic_string <unsigned char>

谈到国际化& Unicode，我是一个白痴的美国程序员。这笔交易。字符串文字basic_string <unsigned char>

#include <string> 
using namespace std; 

typedef basic_string<unsigned char> ustring; 

int main() 
{ 
    static const ustring my_str = "Hello, UTF-8!"; // <== error here 
    return 0; 
}

这发出一个不意外的投诉：

cannot convert from 'const char [14]' to 'std::basic_string<_Elem>'

也许今天我有咖啡的错误部分。我该如何解决？我可以保持基本结构：

ustring something = {insert magic incantation here};

？

来源

2010-09-30 John Dibling

不回答你的问题，但阅读这篇文章在i18n：http：//www.joelonsoftware.com/articles/Unicode.html – Starkey 2010-09-30 20:36:41

看过它，但thx – 2010-09-30 20:39:34

你可能需要提供你自己的'char_traits '专业化。 AFAIK，''只提供'char'和'wchar_t'的专门化。 – Praetorian 2010-09-30 20:44:08

窄字符串文字被定义为const char，而且没有无符号的字符串文字[1]，所以你必须投：

ustring s = reinterpret_cast<const unsigned char*>("Hello, UTF-8");

当然，你可以把那个长长的东西变成一个内联功能：

inline const unsigned char *uc_str(const char *s){ 
    return reinterpret_cast<const unsigned char*>(s); 
} 

ustring s = uc_str("Hello, UTF-8");

或者你也可以只使用basic_string<char>并摆脱它的你处理UTF-8的99.9％。

[1]除非char是无符号的，但不管它是否是实现定义的，等等，等等。

来源

2010-09-30 20:50:59

我*想*这是答案... – 2010-09-30 20:56:05

@Steve，I知道这是旧的，但我很好奇，什么时候basic_string 不适用于存储UTF-8编码的字符串？它只是存储一个从未失败过的字节序列。有没有我不知道的角落案例？ – Matthew 2017-09-13 19:57:32

对不同的编码使用不同的字符类型具有的优点是，编译器会在您将它们混淆时吠叫你。缺点是，你必须手动转换。

一些辅助函数救援：

inline ustring convert(const std::string& sys_enc) { 
    return ustring(sys_enc.begin(), sys_enc.end()); 
} 

template< std::size_t N > 
inline ustring convert(const char (&array)[N]) { 
    return ustring(array, array+N); 
} 

inline ustring convert(const char* pstr) { 
    return ustring(reinterpret_cast<const ustring::value_type*>(pstr)); 
}

当然，所有这些失败默默致命时转换的字符串包含ASCII其他任何东西。

来源

2010-09-30 22:52:33 sbi

不知怎的，我不能使用'convert'的第三个重载。我得到以下编译错误：'错误：从'const char *'转换为'std :: __ cxx11 :: basic_string :: value_type {aka unsigned char}'失去精度[-fpermissive] return ustring（reinterpret_cast （pstr））;'。 [coliru链接]（http://coliru.stacked-crooked.com/a/66b1d6c08a1ad63e） – Patryk 2016-02-22 15:25:45

@Patryk：我相信我已经解决了这个问题。对不起，我很久以前就错了。 – sbi 2016-02-22 15:28:55

这就是我们为此所做的:) – Patryk 2016-02-22 15:33:20

让您的生活更轻松，使用UTF-8字符串库（如http://utfcpp.sourceforge.net/），或者使用std :: wstring并使用UTF-16。您可能有兴趣从堆栈溢出的另一个问题的讨论：C++ strings: UTF-8 or 16-bit encoding?

来源

2010-09-30 23:12:57 Matthew

不能使用UTF-16。传入文件是UTF-8。 – 2010-10-01 14:55:06

我想下一个问题是，在加载之后，您需要如何处理文件中的数据？将其转换为UTF-16可能是有意义的，或者将它保留为UTF-8可能更容易和更高效。 – Matthew 2010-10-01 15:16:21

与UTF-8相比，UTF-16并没有真正的优势。实际上，我能想到的只有两个是A）它是Windows的原生Unicode编码，所以当你在做Windows时，它使它更容易，并且B）当你使用很多那些（CJK）字符时在UTF-8中需要三个字节，但在UTF-16中只需要两个字节，那么UTF-16需要较少的内存。 – sbi 2010-10-01 20:12:35

字符串文字basic_string <unsigned char>

回答

相关问题