2016-11-29 39 views
0

我正在阅读Python 3 docs here,我必须是盲人或其他什么的......它在哪里说如何得到消息的正文?如何从message.parser.Parser返回的Message对象中获取消息正文(或正文)?

我想要做的是打开一条消息,并在消息的基于文本的主体中执行一些循环,跳过二进制附件。伪代码:

def read_all_bodies(local_email_file): 
    email = Parser().parse(open(local_email_file, 'r')) 
    for pseudo_body in email.pseudo_bodies: 
     if pseudo_body.pseudo_is_binary(): 
      continue 
     # Pseudo-parse the body here 

我该怎么做?甚至是消息类正确的类吗?它不只是用于标题吗?

回答

1

这是最好使用两个函数完成:

  1. 一个打开的文件。如果消息是单个部分,则get_payload将在消息中返回字符串。如果消息是多,它返回子消息
  2. 二来处理文本/净荷

这是如何可以做到的名单:

def parse_file_bodies(filename): 
    # Opens file and parses email 
    email = Parser().parse(open(filename, 'r')) 
    # For multipart emails, all bodies will be handled in a loop 
    if email.is_multipart(): 
     for msg in email.get_payload(): 
      parse_single_body(msg) 
    else: 
     # Single part message is passed diractly 
     parse_single_body(email) 

def parse_single_body(email): 
    payload = email.get_payload(decode=True) 
    # The payload is binary. It must be converted to 
    # python string depending in input charset 
    # Input charset may vary, based on message 
    try: 
     text = payload.decode("utf-8") 
     # Now you can work with text as with any other string: 
     ... 
    except UnicodeDecodeError: 
     print("Error: cannot parse message as UTF-8") 
     return 
相关问题