2014-09-25 36 views
0

我登录到我的帐户,如:保持登录和使用Python请求模块设置cookie的,做的东西

import os 
import requests 
from lxml import html 

def GetContent(url): 
    response = requests.get(url) 
    return response.content 

def Parser(content): 
    tree = html.fromstring(content) 
    return [e.text_content() for e in tree.xpath('//div[@class="group"]/div[@class="groupinfo"]/a')] 


def Func(): 
    try: 
     s = requests.Session() 
     email='user' 
     password='123456' 
     post_data={'email':email, 'password':password} 
     post_response=s.post(url='http://site.ir/signin/', data=post_data) 
     resultfile = open("result.txt", "w+") 
     page=1 
     while (page<=750): 
      print 
      print 'Checking page number: ', page 
      url2 = 'http://site.ir/' + str(page) 
      print "URL: " + url2 
      content = GetContent(url2) 
      results = Parser(content) 
      for i in results: 
       print i 
       resultfile.writelines(i+'\n') 
       resultfile.flush() 
      page += 1 
     resultfile.close() 
    except (KeyboardInterrupt, SystemExit): 
     print "\nKeyboardInterruption with Ctrl+c signal" 
     sys.exit(1) 

if __name__ == "__main__": 
    Func() 

我想留洛和做的东西。正如你在代码中看到的那样,我做了一个会话

当页面正在增加时我希望保持登录状态并获取下一页的内容并执行其他操作,,,但它仅返回页面编号为1的内容而页面正在增加。

回答

1

您忽略了您的会话GetContent;您正在提出请求,但没有您的requests.Session对象跟踪的Cookie。请使用s.get()

print "URL: " + url2 
content = s.get(url2).content 
results = Parser(content) 
+0

谢谢...错误'类型错误:预期字符串或buffer'出现 – MLSC 2014-09-25 18:27:18

+0

'打印content'的输出是'<响应[200]>' – MLSC 2014-09-25 18:29:16

+0

你知不知道热来解决它? – MLSC 2014-09-25 18:34:12