2017-09-21 43 views
0

重新启动我试图通过使用python-telegram-bot API包装的电报机器人运行scrapy蜘蛛。使用下面的代码,我可以成功执行蜘蛛,并将抓取的结果转发给机器人,但只有一次,因为我运行脚本。当我尝试通过bot(电报bot命令)重新执行蜘蛛时,出现错误twisted.internet.error.ReactorNotRestartable扭曲的反应堆没有scrapy

from twisted.internet import reactor 
from scrapy import cmdline 
from telegram.ext import Updater, CommandHandler, MessageHandler, Filters, RegexHandler 
import logging 
import os 
import ConfigParser 
import json 
import textwrap 
from MIS.spiders.moodle_spider import MySpider 
from scrapy.utils.project import get_project_settings 
from scrapy.crawler import CrawlerRunner, CrawlerProcess 
from scrapy.utils.log import configure_logging 


# Read settings from config file 
config = ConfigParser.RawConfigParser() 
config.read('./spiders/creds.ini') 
TOKEN = config.get('BOT', 'TOKEN') 
#APP_NAME = config.get('BOT', 'APP_NAME') 
#PORT = int(os.environ.get('PORT', '5000')) 
updater = Updater(TOKEN) 

# Setting Webhook 
#updater.start_webhook(listen="0.0.0.0", 
#      port=PORT, 
#      url_path=TOKEN) 
#updater.bot.setWebhook(APP_NAME + TOKEN) 

logging.basicConfig(format='%(asctime)s -# %(name)s - %(levelname)s - %(message)s',level=logging.INFO) 

dispatcher = updater.dispatcher 

# Real stuff 

def doesntRun(bot, update): 
    #process = CrawlerProcess(get_project_settings()) 
    #process.crawl(MySpider) 
    #process.start() 
    ############ 

    configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'}) 
    runner = CrawlerRunner({ 
     'FEED_FORMAT' : 'json', 
     'FEED_URI' : 'output.json' 
     }) 

    d = runner.crawl(MySpider) 
    d.addBoth(lambda _: reactor.stop()) 
    reactor.run(installSignalHandlers=0) # the script will block here until the crawling is finished 
    #reactor.stop() 

    with open("./output.json", 'r') as file: 
     contents = file.read() 
     a_r = json.loads(contents) 
     AM = a_r[0]['AM'] 
     ... 
     ... 

     message_template = textwrap.dedent(""" 
       AM: {AM} 
       ... 
       """) 
     messageContent = message_template.format(AM=AM, ...) 
     #print messageContent 
     bot.sendMessage(chat_id=update.message.chat_id, text=messageContent) 
     #reactor.stop() 


# Handlers 
test_handler = CommandHandler('doesntRun', doesntRun) 

# Dispatchers 
dispatcher.add_handler(test_handler) 

updater.start_polling() 
updater.idle() 

我使用的代码文档:https://doc.scrapy.org/en/latest/topics/practices.html#run-scrapy-from-a-script

代码是这样的:

from twisted.internet import reactor 
import scrapy 
from scrapy.crawler import CrawlerRunner 
from scrapy.utils.log import configure_logging 

class MySpider(scrapy.Spider): 
    # Your spider definition 
    ... 

configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'}) 
runner = CrawlerRunner() 

d = runner.crawl(MySpider) 
d.addBoth(lambda _: reactor.stop()) 
reactor.run() # the script will block here until the crawling is finished 
+0

重复https://stackoverflow.com/questions/39946632/reactornotrestartable-error-in-while-loop-with-scrapy – rak007

+0

链接的重复问题甚至没有明确的答案。它使用'CrawlerProcess',而不是在我的代码中使用'CrawlerRunner'。 –

+0

看看这些帮助https://stackoverflow.com/questions/1979112/connecting-twice-with-twisted-how-to-do-that-c​​orrectly,https://www.blog.pythonlibrary.org/2016/09/14 /重启-A-双绞线反应器/。这样做的理想方法是使用scrapyd并使用相同的方式安排刮刀 –

回答

0

好吧,我终于解决了我的问题。在doesntRun()函数的末尾

time.sleep(0.2) 
os.execl(sys.executable, sys.executable, *sys.argv) 

的Python的电报机器人API包装提供an easy way to restart the bot.

我简单地把线。现在每当我通过bot调用函数时,它会刮擦页面,存储结果,转发结果,然后重新启动。这样做可以让我执行任意次数的蜘蛛。