unicodedata.digit和unicodedata.numeric有什么区别？

从unicodedata DOC：unicodedata.digit和unicodedata.numeric有什么区别？

unicodedata.digit（CHR [，默认]）返回分配给字符CHR作为整数的数位值。如果没有定义这样的值，则返回默认值，否则引发ValueError。

unicodedata.numeric（chr [，default]）以float形式返回字符chr分配的数字值。如果没有定义这样的值，则返回默认值，否则引发ValueError。

有人可以解释我这两个功能之间的区别吗？

这里的人可以读取the implementation of both functions但对我来说不明显与快速查看有什么不同，因为我不熟悉CPython实现。

EDIT 1：

将是很好，显示差的例子。

编辑2：

例子来自@补充意见和壮观的答案user2357112有用：

print(unicodedata.digit('1')) # Decimal digit one. 
print(unicodedata.digit('١')) # ARABIC-INDIC digit one 
print(unicodedata.digit('¼')) # Not a digit, so "ValueError: not a digit" will be generated. 

print(unicodedata.numeric('Ⅱ')) # Roman number two. 
print(unicodedata.numeric('¼')) # Fraction to represent one quarter.

来源

2017-08-28 gsi-frank

我相信'数字'除了阿拉伯数字之外还适用于其他数字字符，比如DEVANAGIRI ONE等等。 –

@cᴏʟᴅsᴘᴇᴇᴅ你能举一个例子来说明一下吗？ –

从类型和描述来看，数字是用于实际数字的，数字可以处理粗俗分数（例如¾）。来自doc的 – weirdan

答案很简单：如果一个字符代表一个十进制数字

，所以诸如1,¹（SUPERSCRIPT ONE），①（CIRCLED DIGIT ONE），١（ARABIC-INDIC DIGIT ONE），unicodedata.digit将返回字符表示为int的数字（因此所有这些示例均为1）。

如果字符表示任何数值，那么诸如⅐（VULGAR FRACTION ONE SEVENTH）和所有十进制数字示例unicodedata.numeric将会将该字符的数值作为浮点值。

由于技术原因，最近的数字字符如（DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO）可能会从unicodedata.digit引发ValueError。

龙答：

Unicode字符都有一个Numeric_Type财产。该属性可以有4个可能的值：Numeric_Type = Decimal，Numeric_Type = Digit，Numeric_Type = Numeric或Numeric_Type = None。

引述Unicode standard, version 10.0.0, section 4.6，

的Numeric_Type =十进制属性值（其与General_Category =钕属性值相关）被限制为以十进制基数使用号码，并为那些数字字符其中一组完整的数字已被编码在连续的范围内，以Numeric_Value的升序排列，并且数字零作为范围内的第一个编码点。

Numeric_Type =十进制字符因此是十进制数字符合一些其他特定的技术要求。

十进制数字，如通过这些属性分配Unicode标准定义，排除某些字符，如CJK表意位数（参见表4-5第一十个条目），未在编码一个连续的序列。小数位还不包括兼容性下标和上标数字，以防止简单化的解析器在上下文中错误地解释的值。（有关上标和下标的更多信息，请参见第22.4节，上标和下标符号。）传统上，Unicode字符数据库已将这些非连续或兼容性数字的集合赋值为Numeric_Type = Digit，以识别它们由但不一定符合Numeric_Type = Decimal的所有条件。但是， Numeric_Type = Digit和更通用的Numeric_Type = Numeric之间的区别已被证明不是在实现中很有用。因此，未来可能添加到标准并且不符合Numeric_Type = Decimal标准的数字集合将简单地分配为值Numeric_Type = Numeric的。

所以在历史上使用Numeric_Type =数字不配合Numeric_Type =十进制的技术要求其他数字，但他们决定不实用，而数字字符不符合Numeric_Type =十进制需求刚刚被分配Numeric_Type = Unicode 6.3.0以来的数字。例如，在Unicode 7.0中引入的（DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO）具有Numeric_Type = Numeric。

Numeric_Type =数字表示所有代表数字且不符合其他类别的字符，而Numeric_Type = None表示不代表数字的字符（或者至少不在正常使用情况下）。

所有带有非无Numeric_Type属性的字符都有一个Numeric_Value属性来表示它们的数值。 unicodedata.digit将返回该值作为Numeric_Type = Decimal或Numeric_Type = Digit字符的整数，而unicodedata.numeric将返回该值作为具有任何非None Numeric_Type的字符的浮点值。

来源

2017-08-28 17:11:18 user2357112

完美的解释和例子！ –

unicodedata.digit和unicodedata.numeric有什么区别？

回答

相关问题