由于没有到目前为止的工作,我开始Scrapy蜘蛛不工作
python scrapy-ctl.py startproject Nu
一个新的项目我也跟着教程完全相同,并且创建的文件夹,一个新的蜘蛛
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.item import Item
from Nu.items import NuItem
from urls import u
class NuSpider(CrawlSpider):
domain_name = "wcase"
start_urls = ['http://www.whitecase.com/aabbas/']
names = hxs.select('//td[@class="altRow"][1]/a/@href').re('/.a\w+')
u = names.pop()
rules = (Rule(SgmlLinkExtractor(allow=(u,)), callback='parse_item'),)
def parse(self, response):
self.log('Hi, this is an item page! %s' % response.url)
hxs = HtmlXPathSelector(response)
item = Item()
item['school'] = hxs.select('//td[@class="mainColumnTDa"]').re('(?<=(JD,\s))(.*?)(\d+)')
return item
SPIDER = NuSpider()
和当我运行
C:\Python26\Scripts\Nu>python scrapy-ctl.py crawl wcase
我得到
[Nu] ERROR: Could not find spider for domain: wcase
其他蜘蛛至少被Scrapy认可,这个不是。我究竟做错了什么?
感谢您的帮助!
你能提供一个链接到教程(如果它在线)吗?将是一个有趣的阅读:) – RYFN
是的,这里是CrawlSpider示例的链接:http://doc.scrapy.org/topics/spiders。html#crawlspider-example – Zeynel