解决的unicode输入字符串的文件中使用Unicode数据

string1=" म नेपाली हुँ" 
string1=string1.split() 
string1[0] 
'\xe0\xa4\xae' 

with codecs.open('nepaliwords.txt','r','utf-8') as f: 
    for line in f: 
      if string1[0] in line: 
        print "matched string found in file"

Traceback (most recent call last): File "", line 3, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)解决的unicode输入字符串的文件中使用Unicode数据

在文本文件中的比较，我有大量尼泊尔的Unicode的。

我在这里比较两个unicode字符串做错了吗？

如何打印匹配的unicode字符串？

来源

2015-08-15 Bishal Gautam

您的string1是字节字符串，编码为UTF-8。它是而不是一个Unicode字符串。但是您使用codecs.open()将Python 解码为的文件内容为unicode。试图将您的字节字符串用于包含测试会导致Python将字节字符串隐式解码为unicode以匹配类型。由于隐式解码使用ASCII，因此这会失败。

解码string1到unicode第一：

string1 = " म नेपाली हुँ" 
string1 = string1.decode('utf8').split()[0]

或使用的Unicode字符串文字来代替：

string1 = u" म नेपाली हुँ" 
string1 = string1.split()[0]

注意u在开始。

来源

2015-08-15 13:10:49

谢谢string1 = u“मनेपालीहुँ”解决了我的情况。对于string1 = string1.split（）[0] [0]创建的问题..谢谢 –

你能帮我打印匹配的字符串吗？ –

解决的unicode输入字符串的文件中使用Unicode数据

回答

相关问题