在Python 2和Python 3中获取相同的Unicode字符串长度？

呃，Python的2/3的是如此令人沮丧......考虑这个例子，test.py：在Python 2和Python 3中获取相同的Unicode字符串长度？

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import sys 
if sys.version_info[0] < 3: 
    text_type = unicode 
    binary_type = str 
    def b(x): 
    return x 
    def u(x): 
    return unicode(x, "utf-8") 
else: 
    text_type = str 
    binary_type = bytes 
    import codecs 
    def b(x): 
    return codecs.latin_1_encode(x)[0] 
    def u(x): 
    return x 

tstr = " ▲ " 

sys.stderr.write(tstr) 
sys.stderr.write("\n") 
sys.stderr.write(str(len(tstr))) 
sys.stderr.write("\n")

运行它：

$ python2.7 test.py 
▲ 
5 
$ python3.2 test.py 
▲ 
3

太好了，我得到两个不同的字符串大小。希望将字符串包装在我在网上发现的其中一个包装中会有帮助？

tstr = text_type(" ▲ ")对于：

$ python2.7 test.py 
Traceback (most recent call last): 
    File "test.py", line 21, in <module> 
    tstr = text_type(" ▲ ") 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) 
$ python3.2 test.py 
▲ 
3

对于tstr = u(" ▲ ")：

$ python2.7 test.py 
Traceback (most recent call last): 
    File "test.py", line 21, in <module> 
    tstr = u(" ▲ ") 
    File "test.py", line 11, in u 
    return unicode(x) 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) 
$ python3.2 test.py 
▲ 
3

对于tstr = b(" ▲ ")：

$ python2.7 test.py 
▲ 
5 
$ python3.2 test.py 
Traceback (most recent call last): 
    File "test.py", line 21, in <module> 
    tstr = b(" ▲ ") 
    File "test.py", line 17, in b 
    return codecs.latin_1_encode(x)[0] 
UnicodeEncodeError: 'latin-1' codec can't encode character '\u25b2' in position 1: ordinal not in range(256)

对于tstr = binary_type(" ▲ ")：

$ python2.7 test.py 
▲ 
5 
$ python3.2 test.py 
Traceback (most recent call last): 
    File "test.py", line 21, in <module> 
    tstr = binary_type(" ▲ ") 
TypeError: string argument without an encoding

那么，这当然会让事情变得简单。

那么，如何在Python 2.7和3.2中获得相同的字符串长度（本例中为3）呢？

来源

2013-05-10 sdaau

嘛，原来unicode()在Python 2.7有encoding说法，那显然有助于：

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import sys 
if sys.version_info[0] < 3: 
    text_type = unicode 
    binary_type = str 
    def b(x): 
    return x 
    def u(x): 
    return unicode(x, "utf-8") 
else: 
    text_type = str 
    binary_type = bytes 
    import codecs 
    def b(x): 
    return codecs.latin_1_encode(x)[0] 
    def u(x): 
    return x 

tstr = u(" ▲ ") 

sys.stderr.write(tstr) 
sys.stderr.write("\n") 
sys.stderr.write(str(len(tstr))) 
sys.stderr.write("\n")

运行，我得到我需要的东西：

$ python2.7 test.py 
▲ 
3 
$ python3.2 test.py 
▲ 
3

来源

2013-05-10 06:40:26 sdaau

在Python 2和Python 3中获取相同的Unicode字符串长度？

回答

相关问题