将ucs（通用字符集）字符转换为unicode？

我正在阅读某人的代码，我试图看到代码如下。将ucs（通用字符集）字符转换为unicode？

根据评论，这个功能是Convert a UCS character to an UTF-8 string。但什么是ucs字符，将ucs转换为unicode的规则是什么，我可以在哪里找到这些文档？

/* 
* Convert a UCS character to an UTF-8 string 
* 
* Returns the string length of the result 
*/ 
size_t 
tUcs2Utf8(ULONG ulChar, char *szResult, size_t tMaxResultLen) 
{ 
    if (szResult == NULL || tMaxResultLen == 0) { 
     return 0; 
    } 

    if (ulChar < 0x80 && tMaxResultLen >= 2) { 
     szResult[0] = (char)ulChar; 
     szResult[1] = '\0'; 
     return 1; 
    } 
    if (ulChar < 0x800 && tMaxResultLen >= 3) { 
     szResult[0] = (char)(0xc0 | ulChar >> 6); 
     szResult[1] = (char)(0x80 | (ulChar & 0x3f)); 
     szResult[2] = '\0'; 
     return 2; 
    } 
    if (ulChar < 0x10000 && tMaxResultLen >= 4) { 
     szResult[0] = (char)(0xe0 | ulChar >> 12); 
     szResult[1] = (char)(0x80 | (ulChar >> 6 & 0x3f)); 
     szResult[2] = (char)(0x80 | (ulChar & 0x3f)); 
     szResult[3] = '\0'; 
     return 3; 
    } 
    if (ulChar < 0x200000 && tMaxResultLen >= 5) { 
     szResult[0] = (char)(0xf0 | ulChar >> 18); 
     szResult[1] = (char)(0x80 | (ulChar >> 12 & 0x3f)); 
     szResult[2] = (char)(0x80 | (ulChar >> 6 & 0x3f)); 
     szResult[3] = (char)(0x80 | (ulChar & 0x3f)); 
     szResult[4] = '\0'; 
     return 4; 
    } 
    szResult[0] = '\0'; 
    return 0; 
} /* end of tUcs2Utf8 */

来源

2016-01-18 roger

真的吗？ [this]（https://www.google.com/search?q=ucs+character&oq=ucs+character&aqs=chrome..69i57j69i60&sourceid=chrome&es_sm=122&ie=UTF-8）没有帮助？ –

@SouravGhosh，我可以阅读这段代码，但为什么呢？所以我想知道什么是转换之间的规则 – roger

当测试和稳定的替代品存在时，请不要推出自己的代码。如果这是Windows特定的，则可以使用'MultibyteToWideChar'和/或'WideCharToMultibyte'。否则，您可以使用ICU。 – szczurcio

通用字符集是一个ISO standard。它定义了the same characters as Unicode，所以不需要字符转换。 UCS的每个版本本质上都是Unicode标准的某个版本的一小部分。新字符首先添加到Unicode中，并且每隔一段时间，UCS就会与Unicode同步。 Unicode标准的Appendix C包含一个表格，显示不同版本之间的关系。

另请注意，您发布的代码使用非标准上限0x200000。这应该更改为0x110000。

来源

2016-01-18 14:57:14 nwellnhof

将ucs（通用字符集）字符转换为unicode？

回答

相关问题