排除python中的非ASCII字符

我有一个使用字典解密加密消息的脚本，问题是解密过程产生大量垃圾（a.k.a非ascii）字符。这里是我的代码：排除python中的非ASCII字符

from Crypto.Cipher import AES 
import base64 
import os 

BLOCK_SIZE = 32 

PADDING = '{' 

# Encrypted text to decrypt 
encrypted = "WI4wBGwWWNcxEovAe3p+GrpK1GRRQcwckVXypYlvdHs=" 

DecodeAES = lambda c, e: c.decrypt(base64.b64decode(e)).rstrip(PADDING) 

adib = open('words.txt') 
for line in adib.readlines(): 
    secret = line.rstrip('\n') 
    if (secret[-1:] == "\n"): 
     print "Error, new line character at the end of the string. This will not match!" 
    elif (len(secret) >= 32): 
     print "Error, string too long. Must be less than 32 characters." 
    else: 
     # create a cipher object using the secret 
     cipher = AES.new(secret + (BLOCK_SIZE - len(secret) % BLOCK_SIZE) * PADDING) 

     # decode the encoded string 
     decoded = DecodeAES(cipher, encrypted) 
     print decoded+"\n"

什么，我已经想到了迄今为止被转换decoded字符串转换为ASCII然后排除非ASCII字符，但没有奏效。

来源

2016-03-10 shoomy

你能准确的一个“words.txt”文件内容例如请 –

它包含了常用词，但这里有一些话 – shoomy

'的和一个一块集章盗弧的编辑卷他插槽名岛是路飞是为与部分世界类别特别漫画维基维基百科全书是日本这动漫 SBS 卷页 BEGIN END帮助维基蓝船员从用户巴吉秸秆肖像大他海盗新模板海军陆战队他们不帽子魔鬼 FLUSH TOP BOXAD Navibox 猴他们鳄鱼 Down 页面开始小腿有 Shichibukai 所有有佳能规则维基所有页水果佐罗贝利海名时图片一个乌索普战政府准则 Random' – shoomy

这个版本将工作：

#!/usr/bin/env python 
# -*- coding: UTF-8 -*- 

def evaluate_string_is_ascii(mystring): 
    is_full_ascii=True 
    for c in mystring: 
     try: 
      if ord(c)>0 and ord(c)<=127: 
       #print c,"strict ascii =KEEP" 
       pass 
      elif ord(c)>127 and ord(c)<=255: 
       #print c,"extended ascii code =TRASH" 
       is_full_ascii=False 
       break 
      else: 
       # print c,"no ascii =TRASH" 
       is_full_ascii=False 
       break 
     except: 
      #print c,"no ascii =TRASH" 
      is_full_ascii=False 
      break 
    return is_full_ascii 


my_text_content="""azertwxcv 
123456789 
456dqsdq13 
[email protected]��nS��?t#� 
lkjal� 
kfldjkjl&é""" 

for line in my_text_content.split('\n'): 

    #check if line contain only ascii 
    if evaluate_string_is_ascii(line)==True: 

     #print the line 
     print line

来源

2016-03-10 12:08:23

你的代码工作的很好，但我想要的是不打印包含非ASCII字符的行，所以如果'已解码的字符串包含非ASCII字符，它将不会被打印 – shoomy

现在可以吗？您可以在您自己的代码中重复使用'evaluate_string_is_ascii（mystring）'函数，如下所示：'如果evaluate_string_is_ascii（解码）== True：''print decoded +'\ n“' –

现在正在运行，谢谢我的朋友！ – shoomy

您可以删除非ascii字符，如：编辑：更新与解码第一。

output = 'string with some non-ascii characters��@$���9�HK��F�23 some more char' 
output = output.decode('utf-8').encode('ascii', 'ignore')

来源

2016-03-10 12:05:50

我得到一个错误，这个输出'追溯（最近呼叫最后）：文件“code.py”，第28行，在解码= decode.decode（'utf-8'）。encode（'ascii'，'忽略'）文件“/usr/lib/python2.7/encodings/utf_8.py”，第16行解码返回编解码器.utf_8_decode（input，errors，True） UnicodeDecodeError：'utf8'编解码器无法解码位置0中的字节0x96：无效起始字节' – shoomy

排除python中的非ASCII字符

回答

相关问题