18
我试图登录到网站http://www.magickartenmarkt.de并在成员区域(https://www.magickartenmarkt.de/?mainPage=showWants)做一些分析。我看到了其他的例子,但我不明白为什么我的方法不起作用。我确定了第一种方法的正确形式,但尚不清楚它是否有效。 在第二种方法中,returing网页显示我无法访问会员区。如何使用python和机械化登录网站
我会很高兴的任何帮助。
import urllib2
import cookielib
import urllib
import requests
import mechanize
from mechanize._opener import urlopen
from mechanize._form import ParseResponse
USERNAME = 'Test'
PASSWORD = 'bla123'
URL = "http://www.magickartenmarkt.de"
# first approach
request = mechanize.Request(URL)
response = mechanize.urlopen(request)
forms = mechanize.ParseResponse(response, backwards_compat=False)
# I don't want to close?!
#response.close()
# Username and Password are stored in this form
form = forms[1]
form["username"] = USERNAME
form["userPassword"] = PASSWORD
#proof entering data has worked
user = form["username"] # a string, NOT a Control instance
print user
pw = form["userPassword"] # a string, NOT a Control instance
print pw
#is this the page where I will redirected after login?
print urlopen(form.click()).read()
#second approach
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : USERNAME, 'userPassword': PASSWORD})
#login
response_web = opener.open(URL, login_data)
#did it work? for me not....
resp = opener.open('https://www.magickartenmarkt.de/?mainPage=showWants')
print resp.read()
感谢您的建议!它像一个魅力。我需要担心这个实现中的cookie吗?还发现 'browser.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(),max_time = 1)' this。这是否意味着浏览器(对象?)每秒都会刷新网页? – Rappel
浏览器会将cookie保存在自己的会话中(一旦脚本终止或者您不再使用该特定实例,它们将丢失)。但是,如果您希望这些cookie可用于将来的会话(例如,在cookie未过期的情况下将来调用脚本),则必须使用cookielib,http://docs.python.org /2/library/cookielib.html – Ford
我相信它是'browser = mechanize.Browser()' –