有没有办法转换为unicode文件中的文本？在Python

我正在写一个来自巴西页面的刮码，并且我正在将结果写入一个文件，结果是我从代码中得到的结果在ASCII中不受支持，并且给了我这个错误：有没有办法转换为unicode文件中的文本？在Python

File "testUnicode.py", line 6 SyntaxError: Non-ASCII character '\xc3' in file testUnicode.py on line 6, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

，所以我找到了答案这里解决这个错误：

file.write(news.encode('uft8'))

和它的工作，因为它把我关了错误，但事情是，我仍然得到一个坏的方式的信息，像这样：

Em tom de desabafo, peemedebista diz que, no 1Âº mandato, foi um 'vice decorativo' CoalizÃ£o diz que usarÃ¡ sua maioria na Assembleia para libertar antichavistas Segundo autoridades, casal acusado das mortes estava 'radicalizado havia algum tempo' Entre as mulheres, Ãndice vai a 52%; maioria da populaÃ§Ã£o aprova movimentos feministas Manifestantes bloqueiam ruas contra a reorganizaÃ§Ã£o das escolas; houve discussÃ£o com motoristas Animalzinho Ã© menor que um grÃ£o de gergelim

有没有办法解决这个问题？

来源

2015-12-08 AJ Ze

你需要知道原始文本是什么编码。 – BrenBarn

我不认为它是'utf-8'.Use正确的编码 – vks

原来的错误：

File "testUnicode.py", line 6 
    SyntaxError: Non-ASCII character '\xc3' in file testUnicode.py on line 6, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

被造成的，因为你的文件有UTF-8字符，请声明与编码：

# -*- coding: utf-8 -*-

第二个问题造成的，因为无论是把你的文字解释它编码为latin1而不是utf8，例如

c = u'\u00e3' # Codepoint for LATIN SMALL LETTER A WITH TILDE 

c.encode('utf8') # UTF8 encoding produces 2 bytes 
>>> '\xc3\xa3' 

# Those bytes, read as latin1 
print c.encode('utf8').decode('latin1') 
>>> Ã£ 

# E.g. \xc3 => Ã 
#  \xa3 => £

所以，你的文件是写为utf8，但读作latin1。

来源

2015-12-08 05:01:45 memoselyk

有没有办法转换为unicode文件中的文本？在Python

回答

相关问题