我想写一个函数来将UTF8字符串转换为UTF16(小端)。问题是,iconv
函数似乎并未让您事先知道需要多少字节来存储输出字符串。简单的UTF8-> UTF16字符串转换与iconv
我的解决方案是通过分配2*strlen(utf8)
开始,然后在一个循环中运行iconv
,与realloc
必要时增加了缓冲区的大小:
static int utf8_to_utf16le(char *utf8, char **utf16, int *utf16_len)
{
iconv_t cd;
char *inbuf, *outbuf;
size_t inbytesleft, outbytesleft, nchars, utf16_buf_len;
cd = iconv_open("UTF16LE", "UTF8");
if (cd == (iconv_t)-1) {
printf("!%s: iconv_open failed: %d\n", __func__, errno);
return -1;
}
inbytesleft = strlen(utf8);
if (inbytesleft == 0) {
printf("!%s: empty string\n", __func__);
iconv_close(cd);
return -1;
}
inbuf = utf8;
utf16_buf_len = 2 * inbytesleft; // sufficient in many cases, i.e. if the input string is ASCII
*utf16 = malloc(utf16_buf_len);
if (!*utf16) {
printf("!%s: malloc failed\n", __func__);
iconv_close(cd);
return -1;
}
outbytesleft = utf16_buf_len;
outbuf = *utf16;
nchars = iconv(cd, &inbuf, &inbytesleft, &outbuf, &outbytesleft);
while (nchars == (size_t)-1 && errno == E2BIG) {
char *ptr;
size_t increase = 10; // increase length a bit
size_t len;
utf16_buf_len += increase;
outbytesleft += increase;
ptr = realloc(*utf16, utf16_buf_len);
if (!ptr) {
printf("!%s: realloc failed\n", __func__);
free(*utf16);
iconv_close(cd);
return -1;
}
len = outbuf - *utf16;
*utf16 = ptr;
outbuf = *utf16 + len;
nchars = iconv(cd, &inbuf, &inbytesleft, &outbuf, &outbytesleft);
}
if (nchars == (size_t)-1) {
printf("!%s: iconv failed: %d\n", __func__, errno);
free(*utf16);
iconv_close(cd);
return -1;
}
iconv_close(cd);
*utf16_len = utf16_buf_len - outbytesleft;
return 0;
}
这真的是做到这一点的最好方法是什么?重复的realloc
看起来很浪费,但是不知道utf8中的字符序列是什么,以及它们在utf16中会产生什么样的结果,我不知道我能否比2*strlen(utf8)
更好地猜测初始缓冲区大小。
好点重新'strlen',但在我的情况下,我想要一个空终止的输入字符串和输出字符串的非终止缓冲+长度。我没有说清楚。 – craig65535