2016-02-26 79 views
0

我有一个看起来有点像的电子邮件正文。Python解析邮件正文和截断MIME头文件

现在我想删除它的所有标题,只是有对话的电子邮件文本。我怎么能在Python中做到这一点?

我试过email.parser模块,但是这并没有给我我想要的结果。

请找到下面的代码以获取更多信息。

import email 
a="""--c66f5985-233d-4e89-b598-6398b60cbe00 
Content-Type: multipart/alternative; 
    differences="Content-Type"; 
    boundary="d5eff9f8-76b3-4320-adfb-1e51add8fa8f" 

--d5eff9f8-76b3-4320-adfb-1e51add8fa8f 
Content-Type: text/plain; charset=us-ascii 
Content-Transfer-Encoding: quoted-printable 

THis is a demo email body 

Thanks And Regards, 
Ana 
""" 



b = email.message_from_string(a) 
if b.is_multipart(): 
    for payload in b.get_payload(): 
     # if payload.is_multipart(): ... 
     print (payload.get_payload()) 
else: 
    print (b.get_payload()) 

回答

0
import imaplib,email 

hst = "your.host.adresse.com" 
usr = "login" 
pwd = "password" 

imap = imaplib.IMAP4(hst) 

try: 
    imap.login(usr, pwd) 
except Exception as e: 
    raise IOError(e) 

try: 
    imap.select("Inbox") # Tell Imap where to go 
    result, data = imap.uid('search', None, "ALL") 
    latest = data[0].split()[-1] 
    result, data = imap.uid('fetch', latest, '(RFC822)') 
    a = data[0][1] # This contains the Mail Data 


except Exception as e: 
    raise IOError(e) 

b = email.message_from_string(a) 
if b.is_multipart(): 
    for payload in b.get_payload(): 
     b = (payload.get_payload()) 
else: 
    b = (b.get_payload()) 

print b 

这将删除所有来自你不希望出现在最终文本中的邮件的东西。我已经用你的代码测试过了。你没有显示你如何导入邮件(你的a),所以我想这就是你从哪里解码的问题。

如果你有HTML邮件有任何问题:

from bs4 import BeautifulSoup 
soup = BeautifulSoup(b, 'html.parser') 
soup = soup.get_text() 
print soup 

这应该现在做的工作,但我建议你改变默认的Python解析器限于lxml或html5lib。

+0

如果我的电子邮件包含很多电子邮件线索,那该怎么办 – sangeet

+0

而我只是如上所示为emailbody提供便利,没有主机名和其他凭证...... – sangeet