我有一个场景,其中发送用于分析的日志文件有一些非ASCII字符,并最终打破了我无法控制的分析工具之一。所以我决定自己清理一下这个日志,并且提出了以下这个工作,除了当我看到这些字符时我会跳过整条线。我 尝试逐行字符(检查注释)的代码,以便只有这些字符可以被删除并保存实际的ASCII字符,但不能成功。 该评论逻辑和建议/解决方案能否解决该问题的任何原因?使用python从文件中删除非ASCII字符
1:02:失败
采样线54.934/174573 ENQÎNULSUB AY NULEOT/29/abcdefghijg
功能来读取和删除线:
def readlogfile(self, abs_file_name):
"""
Reads and skip the non-ascii chars line from the attached log file and populate the list self.data_bytes
abs_file_name file name should be absolute path
"""
try:
infile = open(abs_file_name, 'rb')
for line in infile:
try:
line.decode('ascii')
self._data_bytes.append(line)
except UnicodeDecodeError as e :
# print line + "Invalid line skipped in " + abs_file_name
print line
continue
# while 1: #code that didn't work to remove just the non-ascii chars
# char = infile.read(1) # read characters from file
# if not char or ord(char) > 127 or ord(char) < 0:
# continue
# else:
# sys.stdout.write(char)
# #sys.stdout.write('{}'.format(ord(char)))
# #print "%s ord = %d" % (char, ord(char))
# self._data_bytes.append(char)
finally:
infile.close()
http://stackoverflow.com/questions/33511317/removing-non-ascii-characters-from-file-text/33511747#33511747这家伙原代码应该为你工作。 –