2017-06-14 33 views
0

我想要获得汉字的Unicode字符表示,例如, '京' --> 4eac为什么mysql十六进制结果与shell和java不同

Shell

➜ ~ printf "%x\n" \'京 
4eac 

Java

jshell> Integer.toHexString('京'); 
$14 ==> "4eac" 

为什么在MySQL中它有差异的结果?

select hex('京'); 
+------------+ 
| hex('京') | 
+------------+ 
| E4BAAC  | 
+------------+ 

show variables like 'char%'; 
+--------------------------+------------------------------------------------------+ 
| Variable_name   | Value            | 
+--------------------------+------------------------------------------------------+ 
| character_set_client  | utf8             | 
| character_set_connection | utf8             | 
| character_set_database | utf8             | 
| character_set_filesystem | binary            | 
| character_set_results | utf8             | 
| character_set_server  | utf8             | 
| character_set_system  | utf8             | 

在MySQL它必须使用以下方式来得到相同的结果如上

select hex(convert('京' using ucs2)); 
+--------------------------------+ 
| hex(convert('京' using ucs2)) | 
+--------------------------------+ 
| 4EAC       | 
+--------------------------------+ 

那么,为什么hex在MySQL中是不同与其他?

除了从Unicode字符

壳牌

➜ ~ echo '\u4eac' 
京 

爪哇

jshell> String s = "\u4eac"; 
s ==> "京" 

Mysql的

select char(0x4eac using ucs2); 
+-------------------------+ 
| char(0x4eac using ucs2) | 
+-------------------------+ 
| 京      | 
+-------------------------+ 

回答

0

UTF-8(MySQL的UTF8或utf8mb4)是一个不同的编码比UCS2(MySQL:ucs2)。

'京' = 
Unicode "codepoint" (in hex) '4eac' = 
UCS2 encoding (2 bytes, in hex) '4EAC' = 
UTF-8 encoding (3 bytes, in hex) 'E4BAAC' = 
html entity '京' (hex) or '京' (decimal) 

参考:http://unicode.scarfboy.com/?s=U%2B4eac

+0

谢谢!所以在shell和jshell环境中,它实际上使用了'ucs2'编码而不是utf8,对吗? – zhuguowei

相关问题