带有Unicode字符的字典键提示错误

CSV文件具有带有未识别字符的字符串，JSON文件具有正确字符串的地图。

FILE.CSV

0,�urawska A. 
1,Polnar J�zef

dict.json

{ 
    "\ufffdurawska A.": "\u017burawska A.", 
    "Polnar J\ufffdzef": "Polnar J\u00f3zef" 
}

parse.py

Traceback (most recent call last): File "parse.py", line 9, in print proper_names[row[1].decode('utf-8')] UnicodeEncodeError: 'ascii' codec can't encode character u'\u017b' in position 0: ordinal not in range(128)

我如何使用字典与解码的字符串？

来源

2015-09-24 CodeNinja

对我来说，它看起来像你控制台无法处理'UTF-8' 。如果您直接尝试将值打印到控制台，如'print proper_names.values（）[0]'，您会得到什么？ –

'UnicodeEncodeError：'ascii'编解码器无法对位置8中的字符u'\ xf3'进行编码：序号不在范围内（128）' – CodeNinja

如果我看看错误信息，我认为问题是价值，而不是关键。（\ u017b是在价值）

所以还必须对结果进行编码：

print proper_names[row[1].decode('utf-8')].encode('utf-8')

（编辑：修正，以解决未来的参考意见）

来源

2015-09-24 11:22:34 Pieter21

引发KeyError：'\ xef \ xbf \ xbdurawska A.' – CodeNinja

'print proper_names [row [1] .decode（'utf-8'）]。encode（'utf-8'）'< - 这是正确答案 – CodeNinja

我能重现错误并确定它发生的位置。实际上，使用unicode键的字典不会造成问题，当您尝试打印无法用ascii表示的unicode字符时会发生错误。如果将打印分为两行：

for row in reader: 
    val = proper_names[row[1].decode('utf-8')] 
    print val

错误将发生在print行。

您必须使用正确的字符集对其进行编码。一个我知道的最好的是latin1的，但它不能代表\ u017b，所以我再次使用UTF8：

for row in reader: 
    val = proper_names[row[1].decode('utf-8')] 
    print val.encode('utf8')

或直接

for row in reader: 
    print proper_names[row[1].decode('utf-8')].encode('utf8')

来源

2015-09-24 11:46:06

带有Unicode字符的字典键提示错误

回答

相关问题