Python打印无法同时打印Unicode和字符串

以下是我观察到的一些情况。想知道为什么Python的打印行为是这样的，并且可能的修复。Python打印无法同时打印Unicode和字符串

>>> print "%s" % u"abc" # works 
>>> print "%s" % "\xd1\x81" # works 
>>> print "%s %s" % (u"abc", "\xd1\x81") # Error

对于上述（最后），我得到：UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 0: ordinal not in range(128)

但是，这个工作

>>> print "%s %s" % ("abc", "\xd17\x81") # works

当我做

>>> print "%s %s" % (u"abc", u"\u0441") # Error

其提高UnicodeEncodeError: 'charmap' codec can't encode character u'\u0441' in position 4: character maps to <undefined>

2015-09-24 Surya

这是正确的。输出时，必须将unicode对象编码为所需的字符编码，即utf-8或其他。将unicode（包括所有文字）作为一个抽象，必须在序列化之前编码为类似utf-8的东西。

您可以编码unicode对象s至utf-8与s.encode('utf-8')。 str Python 2中的对象是字节编码的，因此您不会遇到类似“\ xd17 \ 81”的错误，它们已经被编码。

我建议你使用Python 3而不是Python 2，这样更直观。

2015-09-24 17:12:24 proycon

当您在Python 2中混合使用Unicode字符串和字节字符串时，使用默认ascii编解码器将字节字符串隐式强制为Unicode。如果失败，您将获得UnicodeDecodeError。

当您打印Unicode字符串时，它们隐式编码为当前输出编码。如果失败，您将得到UnicodeEncodeError。

所以：

>>> print "%s" % u"abc"

是真的：

>>> print unicode("%s",'ascii') % u"abc" # and valid

但如果你的意思是只有以下工作 “不会引发错误”。如果您希望打印U + 0441字符，则只有在输出编码为UTF-8时才会这样做。它在我的Windows系统上打印垃圾。

>>> print "%s" % "\xd1\x81"

以下给出，因为隐式的Unicode解码的误差：

print "%s %s" % (u"abc", "\xd1\x81")

这是真的：

print unicode("%s %s",'ascii') % (u"abc", unicode("\xd1\x81",'ascii'))

\xd1和0x81是0-7Fh的ASCII范围之外。

最后一个错误意味着您的输出编码不是UTF-8，因为它无法将\u0441编码为打印输出编码支持的字符。 UTF-8可以编码所有Unicode字符。

2015-09-24 18:25:03

你应该强调：“不要混合Unicode和字节串” – jfs

回答