我正在使用Scrapy找到一个学校项目,以查找死链接和缺页。我已经编写了管道,用于写入带有相关刮取信息的文本文件。我在计算蜘蛛运行结束时如何发送电子邮件以及作为附件创建的文件时遇到了麻烦。抓取网站后发送带附件的电子邮件
Scrapy具有内置的电子邮件功能,并触发信号时,蜘蛛完成,但在某种程度上,这是明智的是躲避我一起得到的一切。任何帮助将不胜感激。
这里是我的创建与刮数据文件管道:
class saveToFile(object):
def __init__(self):
# open files
self.old = open('old_pages.txt', 'wb')
self.date = open('pages_without_dates.txt', 'wb')
self.missing = open('missing_pages.txt', 'wb')
# write table headers
line = "{0:15} {1:40} {2:} \n\n".format("Domain","Last Updated","URL")
self.old.write(line)
line = "{0:15} {1:} \n\n".format("Domain","URL")
self.date.write(line)
line = "{0:15} {1:70} {2:} \n\n".format("Domain","Page Containing Broken Link","URL of Broken Link")
self.missing.write(line)
def process_item(self, item, spider):
# add items to file as they are scraped
if item['group'] == "Old Page":
line = "{0:15} {1:40} {2:} \n".format(item['domain'],item["lastUpdated"],item["url"])
self.old.write(line)
elif item['group'] == "No Date On Page":
line = "{0:15} {1:} \n".format(item['domain'],item["url"])
self.date.write(line)
elif item['group'] == "Page Not Found":
line = "{0:15} {1:70} {2:} \n".format(item['domain'],item["referrer"],item["url"])
self.missing.write(line)
return item
我想发送的电子邮件创建一个单独的管道项目。我至今如下:
class emailResults(object):
def __init__(self):
dispatcher.connect(self.spider_closed, spider_closed)
dispatcher.connect(self.spider_opened, spider_opened)
old = open('old_pages.txt', 'wb')
date = open('pages_without_dates.txt', 'wb')
missing = open('missing_pages.txt', 'wb')
oldOutput = open('twenty_oldest_pages.txt', 'wb')
attachments = [
("old_pages", "text/plain", old)
("date", "text/plain", date)
("missing", "text/plain", missing)
("oldOutput", "text/plain", oldOutput)
]
self.mailer = MailSender()
def spider_closed(SPIDER_NAME):
self.mailer.send(to=["[email protected]"], attachs=attachments, subject="test email", body="Some body")
看来,在Scrapy以前的版本,你可以通过自成spider_closed功能,但在目前的版本(0.21)的spider_closed功能仅通过蜘蛛名。
任何帮助和/或建议将不胜感激。
谢谢你的建议,非常有帮助。 – bornytm