2012-10-05 76 views
54

我正在使用requests模块(版本0.10.0与Python 2.5)。 我已经想出了如何将数据提交到网站上的登录表单并检索会话密钥,但我看不到在随后的请求中使用此会话密钥的明显方法。 有人可以在下面的代码填写省略号或建议另一种方法?Python请求和持久会话

>>> import requests 
>>> login_data = {'formPosted':'1', 'login_email':'[email protected]', 'password':'pw'} 
>>> r = requests.post('https://localhost/login.py', login_data) 
>>> 
>>> r.text 
u'You are being redirected <a href="profilePage?_ck=1349394964">here</a>' 
>>> r.cookies 
{'session_id_myapp': '127-0-0-1-825ff22a-6ed1-453b-aebc-5d3cf2987065'} 
>>> 
>>> r2 = requests.get('https://localhost/profile_data.json', ...) 

回答

113

您可以轻松创建持续会议使用:

s = requests.session() 

之后,继续你的要求,你会:

s.post('https://localhost/login.py', login_data) 
#logged in! cookies saved for future requests. 
r2 = s.get('https://localhost/profile_data.json', ...) 
#cookies sent automatically! 
#do whatever, s will keep your cookies intact :) 

更多关于会议:http://docs.python-requests.org/en/latest/user/advanced/#session-objects

+0

感谢Anuj,这是一个完美的解决方案。 它比python-requests文档中的示例更清晰。 – ChrisGuest

+2

任何在脚本运行之间保存Session本身的方法? – Gtx

+4

pickle.dump会话cookie可以像pickle.dump(session.cookies._cookies,file)和pickle.load这样的文件,如下所示cookies = pickle.load(file)cj = requests.cookies.RequestsCookieJar()cj。 _cookies = cookies和session.cookies = cj – Cyril

7

看看我的回答在这个类似的问题:

python: urllib2 how to send cookie with urlopen request

import urllib2 
import urllib 
from cookielib import CookieJar 

cj = CookieJar() 
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) 
# input-type values from the html form 
formdata = { "username" : username, "password": password, "form-id" : "1234" } 
data_encoded = urllib.urlencode(formdata) 
response = opener.open("https://page.com/login.php", data_encoded) 
content = response.read() 

编辑:

我知道我已经得到了我的答案downvote,但没有解释评论。我猜这是因为我指的是urllib库而不是requests。我这样做是因为OP要求requests寻求帮助,或有人建议另一种方法。

+2

我不是你的一个低调选民,但作为一个猜测,很多读者可能把OP的最后一句话当作“可以在下面的代码中填写省略号还是建议另一种方法[与请求库将涉及更多的重大手术,而不仅仅是用别的东西填补椭圆]。“ - 但这只是我的猜测。 –

+2

作为OP,我可以说你的答案提供了一个有用的选择。如果只是为了证明“请求”提供了一个简单而高层次的解决方案,否则就需要3个库来实现。 – ChrisGuest

4

其他答案有助于了解如何维护此类会话。另外,我想提供一个类来保持会话在脚本的不同运行中保持不变(使用缓存文件)。这意味着只有在需要时才执行正确的“登录”(高速缓存中存在timout或不存在会话)。它也支持通过后续调用“get”或“post”的代理设置。

它使用Python3进行测试。

将它用作您自己的代码的基础。下面的代码片段是释放与GPL v3的

import pickle 
import datetime 
import os 
from urllib.parse import urlparse 
import requests  

class MyLoginSession: 
    """ 
    a class which handles and saves login sessions. It also keeps track of proxy settings. 
    It does also maintine a cache-file for restoring session data from earlier 
    script executions. 
    """ 
    def __init__(self, 
       loginUrl, 
       loginData, 
       loginTestUrl, 
       loginTestString, 
       sessionFileAppendix = '_session.dat', 
       maxSessionTimeSeconds = 30 * 60, 
       proxies = None, 
       userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1', 
       debug = True, 
       forceLogin = False, 
       **kwargs): 
     """ 
     save some information needed to login the session 

     you'll have to provide 'loginTestString' which will be looked for in the 
     responses html to make sure, you've properly been logged in 

     'proxies' is of format { 'https' : 'https://user:[email protected]:port', 'http' : ... 
     'loginData' will be sent as post data (dictionary of id : value). 
     'maxSessionTimeSeconds' will be used to determine when to re-login. 
     """ 
     urlData = urlparse(loginUrl) 

     self.proxies = proxies 
     self.loginData = loginData 
     self.loginUrl = loginUrl 
     self.loginTestUrl = loginTestUrl 
     self.maxSessionTime = maxSessionTimeSeconds 
     self.sessionFile = urlData.netloc + sessionFileAppendix 
     self.userAgent = userAgent 
     self.loginTestString = loginTestString 
     self.debug = debug 

     self.login(forceLogin, **kwargs) 

    def modification_date(self, filename): 
     """ 
     return last file modification date as datetime object 
     """ 
     t = os.path.getmtime(filename) 
     return datetime.datetime.fromtimestamp(t) 

    def login(self, forceLogin = False, **kwargs): 
     """ 
     login to a session. Try to read last saved session from cache file. If this fails 
     do proper login. If the last cache access was too old, also perform a proper login. 
     Always updates session cache file. 
     """ 
     wasReadFromCache = False 
     if self.debug: 
      print('loading or generating session...') 
     if os.path.exists(self.sessionFile) and not forceLogin: 
      time = self.modification_date(self.sessionFile)   

      # only load if file less than 30 minutes old 
      lastModification = (datetime.datetime.now() - time).seconds 
      if lastModification < self.maxSessionTime: 
       with open(self.sessionFile, "rb") as f: 
        self.session = pickle.load(f) 
        wasReadFromCache = True 
        if self.debug: 
         print("loaded session from cache (last access %ds ago) " 
           % lastModification) 
     if not wasReadFromCache: 
      self.session = requests.Session() 
      self.session.headers.update({'user-agent' : self.userAgent}) 
      res = self.session.post(self.loginUrl, data = self.loginData, 
            proxies = self.proxies, **kwargs) 

      if self.debug: 
       print('created new session with login') 
      self.saveSessionToCache() 

     # test login 
     res = self.session.get(self.loginTestUrl) 
     if res.text.lower().find(self.loginTestString.lower()) < 0: 
      raise Exception("could not log into provided site '%s'" 
          " (did not find successful login string)" 
          % self.loginUrl) 

    def saveSessionToCache(self): 
     """ 
     save session to a cache file 
     """ 
     # always save (to update timeout) 
     with open(self.sessionFile, "wb") as f: 
      pickle.dump(self.session, f) 
      if self.debug: 
       print('updated session cache-file %s' % self.sessionFile) 

    def retrieveContent(self, url, method = "get", postData = None, **kwargs): 
     """ 
     return the content of the url with respect to the session. 

     If 'method' is not 'get', the url will be called with 'postData' 
     as a post request. 
     """ 
     if method == 'get': 
      res = self.session.get(url , proxies = self.proxies, **kwargs) 
     else: 
      res = self.session.post(url , data = postData, proxies = self.proxies, **kwargs) 

     # the session has been updated on the server, so also update in cache 
     self.saveSessionToCache()    

     return res 

使用上述类可能看起来像这样的代码片段:

if __name__ == "__main__": 
    # proxies = {'https' : 'https://user:[email protected]:port', 
    #   'http' : 'http://user:[email protected]:port'} 

    loginData = {'user' : 'usr', 
       'password' : 'pwd'} 

    loginUrl = 'https://...' 
    loginTestUrl = 'https://...' 
    successStr = 'Hello Tom' 
    s = MyLoginSession(loginUrl, loginData, loginTestUrl, successStr, 
         #proxies = proxies 
         ) 

    res = s.retrieveContent('https://....') 
    print(res.text) 

    # if, for instance, login via JSON values required try this: 
    s = MyLoginSession(loginUrl, None, loginTestUrl, successStr, 
         #proxies = proxies, 
         json = loginData) 
+1

这是一个很好的答案,搜索这个解决方案也很奇怪。 – duality

0

在尝试上述所有问题的答案,我发现,使用RequestsCookieJar代替经常CookieJar为后续请求解决了我的问题。

import requests 
import json 

authUrl = 'https://whatever.com/login' 

#The subsequent url 
testUrl = 'https://whatever.com/someEndpoint' 

#Whatever you are posting 
login_data = {'formPosted':'1', 'login_email':'[email protected]', 'password':'pw'} 

#The auth token or any other data that we will recieve from the authRequest. 
token = '' 

# Post the loginRequest 
loginRequest = requests.post(authUrl,login_data) 
print loginRequest.text 

# Save the request content to your variable. In this case I needed a field called token. 
token = str(json.loads(loginRequest.content)['token']) 
print token 

# Verify successfull login 
print loginRequest.status_code 

#Create your RequestsCookieJar for your subsequent requests and add the cookie 
jar = requests.cookies.RequestsCookieJar() 
jar.set('LWSSO_COOKIE_KEY', token) 

#Execute your next request(s) with the RequestCookieJar set 
r = requests.get(testUrl, cookies=jar) 
print(r.text) 
print(r.status_code) 
0

片段检索JSON数据,密码保护

import requests 

username = "my_user_name" 
password = "my_super_secret" 
url = "https://www.my_base_url.com" 
the_page_i_want = "/my_json_data_page" 

session = requests.Session() 
# retrieve cookie value 
resp = session.get(url+'/login') 
csrf_token = resp.cookies['csrftoken'] 
# login, add referer 
resp = session.post(url+"/login", 
        data={ 
         'username': username, 
         'password': password, 
         'csrfmiddlewaretoken': csrf_token, 
         'next': the_page_i_want, 
        }, 
        headers=dict(Referer=url+"/login")) 
print(resp.json())