将Unicode的UTF8表示写入文件

我有一个专有文件（数据库）格式，目前我正尝试迁移到SQL数据库。因此，我将文件转换为SQL转储，这已经正常工作。现在剩下的唯一问题是它们处理不在ASCII-decimal范围32到126内的字符的奇怪方式。它们具有以Unicode存储的所有这些字符的集合（十六进制 - 例如20AC =€），由它们自己索引内部索引。将Unicode的UTF8表示写入文件

我现在的计划是：我想创建一个表，其中存储内部索引，unicode（十六进制）和字符表示形式（UTF-8）。此表格可以用于未来的更新。

现在到问题：我如何写一个unicode十六进制值的UTF-8字符表示形式的文件？当前的代码如下所示：

this->outFile.open(fileName + ".sql", std::ofstream::app); 
std::string protyp; 
this->inFile.ignore(2); // Ignore the ID = 01. 
std::getline(this->inFile, protyp); // Get the PROTYP Identifier (e.g. \321) 
protyp = "\\" + protyp; 

std::string unicodeHex; 
this->inFile.ignore(2); // Ignore the ID = 01. 
std::getline(this->inFile, unicodeHex); // Get the Unicode HEX Identifier (e.g. 002C) 

std::wstring_convert<std::codecvt_utf8<wchar_t>> converter; 
const std::wstring wide_string = this->s2ws("\\u" + unicodeHex); 
const std::string utf8_rep = converter.to_bytes(wide_string); 

std::string valueString = "('" + protyp + "', '" + unicodeHex + "', '" + utf8_rep + "')"; 

this->outFile << valueString << std::endl; 

this->outFile.close();

但这只是打印出这样的事：

('\321', '002C', '\u002C'),

虽然所需的输出将是：

('\321', '002C', ','),

我到底做错了什么？我不得不承认，当我提到字符编码和其他东西时，我并不确定：/。如果它有任何区别，我正在使用Windows 7 64位。在此先感谢。

来源

2015-04-06 puelo

从'转换采取\ u002C'到宽字符值出现在编译的时候，不运行。你需要忘记'\ u'并做一个字符串到整数的转换。 –

它的工作原理！非常感谢。如果你愿意，你可以添加你的评论作为答案。我会尽快接受它。我还会添加一些代码作为第二个答案来介绍我提出的解决方案。 – puelo

我的目标是给你足够的信息来自己产生答案，看起来我成功了。您真诚的感谢是我需要的全部奖励。 –

正如@Mark Ransom在评论中指出的，我最好的选择是将十六进制字符串转换为整数并使用它。这是我做过什么：

unsigned int decimalHex = std::stoul(unicodeHex, nullptr, 16);; 

std::string valueString = "('" + protyp + "', '" + unicodeHex + "', '" + this->UnicodeToUTF8(decimalHex) + "')";

虽然对于UnicodeToUTF8功能是从这里Unsigned integer as UTF-8 value

std::string UnicodeToUTF8(unsigned int codepoint) 
{ 
    std::string out; 

    if (codepoint <= 0x7f) 
     out.append(1, static_cast<char>(codepoint)); 
    else if (codepoint <= 0x7ff) 
    { 
     out.append(1, static_cast<char>(0xc0 | ((codepoint >> 6) & 0x1f))); 
     out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f))); 
    } 
    else if (codepoint <= 0xffff) 
    { 
     out.append(1, static_cast<char>(0xe0 | ((codepoint >> 12) & 0x0f))); 
     out.append(1, static_cast<char>(0x80 | ((codepoint >> 6) & 0x3f))); 
     out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f))); 
    } 
    else 
    { 
     out.append(1, static_cast<char>(0xf0 | ((codepoint >> 18) & 0x07))); 
     out.append(1, static_cast<char>(0x80 | ((codepoint >> 12) & 0x3f))); 
     out.append(1, static_cast<char>(0x80 | ((codepoint >> 6) & 0x3f))); 
     out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f))); 
    } 
    return out; 
}

来源

2015-04-06 15:38:42 puelo

嘿，我认识到这个代码！ –

哈！没有注意到！伟大的代码！ – puelo

将Unicode的UTF8表示写入文件

回答

相关问题