2017-07-05 60 views
0

这是我的原则,这是我第一次使用crawlspider,所以我怎么能在我的规则停止重定向(302)我用crawlspider我怎么能阻止重定向规则

rules = (
     Rule(LinkExtractor(allow=r'zhaopin/.*'), follow=True), 
     Rule(LinkExtractor(allow=r'gongsi/j.*/.html'), follow=True), 
     Rule(LinkExtractor(allow=r'jobs/.*.html'), callback='parse_job', follow=True), 
    ) 

这是调试,你可以看到,

2017-07-05 09:20:24 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/CTO/> 
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/jiagoushi/> 
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/C%23/> 
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/youxizhizuoren/> 
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/chanpinbujingli/> 
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/wuxianchanpinshejishi/> 
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/wangyechanpinshejishi/> 
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/chanpinshixisheng/> 
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/dbaqita/> 
2017-07-05 09:20:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=60.211.222.66> from <GET https://www.lagou.com/zhaopin/guanggaoshejishi/> 
2017-07-05 09:20:26 [scrapy.crawler] INFO: Received SIG_UNBLOCK, shutting down gracefully. Send again to force 

回答

0

添加cookies和用户代理设置中,就像

DEFAULT_REQUEST_HEADERS = { 
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 
    'Accept-Language': 'en', 
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36', 
    'Cookie': 'user_trace_token=201708...', 
    'Referer': 'https://www.lagou.com' 
} 
相关问题