2017-03-18 116 views
-1

使用scrapy进行instagram登录。 我使用FormRequest发布用户名和密码。并启用COOKIES_ENABLED = True使用scrapy进行instagram用户登录

我scrapy代码:

import scrapy 
from scrapy.http import Request, FormRequest 
class InsSpider(scrapy.Spider): 
    name = 'InsVideo' 
    allowed_domains = ['instagram.com'] 

    url = 'https://www.instagram.com/' 
    url_login = 'https://www.instagram.com/accounts/login/ajax/' 

    def start_requests(self): 
     return [Request(self.url_login, callback=self.login)] 
    def login(self, response): 
     login_post = {'username': 'username', 
         'password': 'password'} 
     return [FormRequest.from_response(response, 
              formdata=login_post, 
              # callback=self.start_requests, 
              dont_filter=True 
             )] 

我运行scrapy crawl InsVideo,并返回错误信息:

2017-03-18 12:15:49 [scrapy.core.engine] INFO: Spider opened 
2017-03-18 12:15:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 
2017-03-18 12:15:49 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023 
2017-03-18 12:15:52 [scrapy.downloadermiddlewares.cookies] DEBUG: Received cookies from: <200 https://www.instagram.com/robots.txt> 
Set-Cookie: mid=WMy0dwALAAGACJPXOYvoxHfHO00m; expires=Fri, 13-Mar-2037 04:15:51 GMT; Max-Age=630720000; Path=/ 

Set-Cookie: csrftoken=5JrsDnF569QLnmIzg4h0VOBRJ8gHQZZi; expires=Sat, 17-Mar-2018 04:15:51 GMT; Max-Age=31449600; Path=/; Secure 

2017-03-18 12:15:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.instagram.com/robots.txt> (referer: None) 
2017-03-18 12:15:52 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET https://www.instagram.com/accounts/login/ajax/> 
Cookie: mid=WMy0dwALAAGACJPXOYvoxHfHO00m; csrftoken=5JrsDnF569QLnmIzg4h0VOBRJ8gHQZZi 

2017-03-18 12:15:52 [scrapy.downloadermiddlewares.cookies] DEBUG: Received cookies from: <405 https://www.instagram.com/accounts/login/ajax/> 
Set-Cookie: csrftoken=5JrsDnF569QLnmIzg4h0VOBRJ8gHQZZi; expires=Sat, 17-Mar-2018 04:15:52 GMT; Max-Age=31449600; Path=/; Secure 

2017-03-18 12:15:52 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://www.instagram.com/accounts/login/ajax/> (referer: None) 
2017-03-18 12:15:52 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://www.instagram.com/accounts/login/ajax/>: HTTP status code is not handled or not allowed 
2017-03-18 12:15:52 [scrapy.core.engine] INFO: Closing spider (finished) 

我不知道什么是错的代码。谢谢

回答

0

您的url_login有误,应该是https://www.instagram.com/accounts/login/

无论如何,Istagram登录页面通过JavaScript生成登录表单。您可以通过浏览器的“查看页面源代码”功能看到:在生成的HTML代码中,没有<form>标签。这正是Scrapy所看到的。您必须使用系统来运行JavaScript代码,也许是无头浏览器。

更正的句子

+1

嗨,现在我可以用两种方法登录instagram。使用cookie设置与scrapy。并使用头和cookie的请求库。但FormRequest是没有必要的。谢谢你的回答。 –