对于python 2.x中的unicode字符串，等效于string.ascii_letters？

在标准库的“串”模块，对于python 2.x中的unicode字符串，等效于string.ascii_letters？

string.ascii_letters ## Same as string.ascii_lowercase + string.ascii_uppercase

是

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

是否有类似的常数，其中将包括被认为是在统一的一封信一切吗？

来源

2010-01-24 emm

您可以构建自己的Unicode的大写和小写字母常数：

import unicodedata as ud 
all_unicode = ''.join(unichr(i) for i in xrange(65536)) 
unicode_letters = ''.join(c for c in all_unicode 
          if ud.category(c)=='Lu' or ud.category(c)=='Ll')

这使得字符串2153个字符（窄的Unicode的Python版本）。对于像letter in unicode_letters代码这将是更快地使用一组，而不是：

unicode_letters = set(unicode_letters)

来源

2010-01-24 15:58:49

我问过这个问题的好答案。然而，我发现了另一个更适合我需求的解决方案（请参阅下面我自己的回答） – emm 2010-01-24 17:25:42

'（'Lu'，'Ll'）'ud.category.c' – jsbueno 2015-05-15 14:06:14

这将是一个非常大的常数。 Unicode目前覆盖超过100.000个不同的字符。所以答案是否定的。

问题是为什么你会需要它？例如，可能有其他解决unicodedata模块问题的方法。

更新：您可以使用所有的Unicode数据点名称和其他信息从ftp://ftp.unicode.org/下载文件，并用它来做大量有趣的事情。

来源

2010-01-24 09:44:36

没有字符串，但可以使用unicodedata模块检查字符是否为字母，特别是其category()函数。

>>> unicodedata.category(u'a') 
'Ll' 
>>> unicodedata.category(u'A') 
'Lu' 
>>> unicodedata.category(u'5') 
'Nd' 
>>> unicodedata.category(u'ф') # Cyrillic f. 
'Ll' 
>>> unicodedata.category(u'٢') # Arabic-indic numeral for 2. 
'Nd'

Ll表示“字母，小写”。 Lu表示“字母，大写”。 Nd的意思是“数字，数字”。

来源

2010-01-24 10:05:15

只是为了回答完整，这里是所有Unicode类别的列表：http://www.fileformat.info/info/unicode/ category/index.htm – 2010-01-24 11:54:56

-1

正如前面的答案中提到的那样，字符串的确会是的方式太长了。所以，你必须针对（a）特定的语言。
[编辑：我意识到这是我原来的预期用途，并为大多数用途，我想。然而，在此期间，马克Tolonen给了一个很好的回答这个问题，因为它是问，所以我选择了他的答案，虽然我用以下解决方案]

这是很容易与“区域设置”模块进行：

import locale 
import string 
code = 'fr_FR' ## Do NOT specify encoding (see below) 
locale.setlocale(locale.LC_CTYPE, code) 
encoding = locale.getlocale()[1] 
letters = string.letters.decode(encoding)

“字母”是117个字符长的unicode字符串。

显然，string.letters依赖于所选语言代码的默认编码，而不是语言本身。将语言环境设置为fr_FR或de_DE或es_ES会将string.letters更新为相同的值（因为它们全都默认是在ISO8859-1中编码的）。

如果将编码添加到语言代码（de_DE.UTF-8）中，则将使用默认编码来代替string.letters。如果您使用了上述代码的其余部分，则会导致UnicodeDecodeError。

来源

2010-01-24 11:08:34 emm

对于python 2.x中的unicode字符串，等效于string.ascii_letters？

回答

相关问题