在Scrapy类中定义其他方法

如何执行Scrapy类以及如何将其他方法添加到蜘蛛类中？在Scrapy类中定义其他方法

例如，从文档：

import scrapy 

class DmozSpider(scrapy.Spider): 
    name = "dmoz" 
    allowed_domains = ["dmoz.org"] 
    start_urls = [ 
     "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/", 
     "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/" 
    ] 

    def parse(self, response): 
     filename = response.url.split("/")[-2] + '.html' 
     with open(filename, 'wb') as f: 
      f.write(response.body)

如果我想定义查询数据库或其他什么东西的一些方法，我怎么可能去有关，为什么？

来源

2016-05-17 Adders

能否请您详细说明你要查询的数据库是什么？谢谢。 – alecxe

要抓取的网址，例如 – Adders

我们来看看下面的使用案例 - 获取网址从数据库中抓取。为此，您需要使用start_requests() method而不是start_urls。

示例代码（直接使用MySQLdb驱动程序）：

import MySQLdb 
import scrapy 

class DmozSpider(scrapy.Spider): 
    name = "dmoz" 
    allowed_domains = ["dmoz.org"] 

    def start_requests(self): 
     db = MySQLdb.connect(host="host", user="user" ...) 
     cursor = db.cursor() 

     cursor.execute("SELECT url from url_table") 
     requests = [scrapy.Request(url=row[0]) for row in cursor.fetchall()] 

     cursor.close() 

     return requests 

    def parse(self, response): 
     filename = response.url.split("/")[-2] + '.html' 
     with open(filename, 'wb') as f: 
      f.write(response.body)

来源

2016-05-17 21:09:52 alecxe

目前我如何设置它完全在类之外，是从上到下执行的类方法，还是仅允许预定义的方法，即start_requests方法？ – Adders

@Adders这个特殊的用例涉及使用'start_requests（）'方法，它是一个特殊的方法 - 专门为动态提供启动URL而设计。但是，除了特殊的“内置”方法外，它只是一个常规的Python类。 – alecxe

好的，谢谢 - 有点回答我的问题:) – Adders

在Scrapy类中定义其他方法

回答

相关问题