使用urllib2登录到网站 - Python 2.7

好的，所以我用这个来作为reddit机器人，但我想能够弄清楚如何登录到任何网站。如果这是有道理的......使用urllib2登录到网站 - Python 2.7

我意识到，不同的网站使用不同的登录表单等等。那么我怎么知道如何优化它为每个网站？我假设我需要在html文件中查找某些内容，但不知道是什么。

我不想使用机械化或任何其他库（这是所有其他答案都在这里，并没有真正帮助我了解发生了什么），因为我想自己学习究竟如何它一切正常。

urllib2文档确实没有帮助我。

谢谢。

2012-12-18 tommo

我会在前言中说我还没有用这种方式登录过一段时间，所以我可能会错过一些更“接受”的方式来做到这一点。

我不知道这是否是你追求的，但没有像mechanize库或类似selenium一个更强有力的框架，在基本情况下，你只要看看表单本身并寻求出inputs。举例来说，看着www.reddit.com，然后查看所呈现的页面的源代码，你会发现这种形式：

<form method="post" action="https://ssl.reddit.com/post/login" id="login_login-main" 
    class="login-form login-form-side"> 
    <input type="hidden" name="op" value="login-main" /> 
    <input name="user" placeholder="username" type="text" maxlength="20" tabindex="1" /> 
    <input name="passwd" placeholder="password" type="password" tabindex="1" /> 

    <div class="status"></div> 

    <div id="remember-me"> 
     <input type="checkbox" name="rem" id="rem-login-main" tabindex="1" /> 
     <label for="rem-login-main">remember me</label> 
     <a class="recover-password" href="/password">reset password</a> 
    </div> 

    <div class="submit"> 
     <button class="btn" type="submit" tabindex="1">login</button> 
    </div> 

    <div class="clear"></div> 
</form>

这里，我们看到了几个input的 - op，user，passwd和rem。此外，请注意action参数 - 即表单将发布到的URL，因此将成为我们的目标。所以现在最后一步是将这些参数打包成一个有效载荷，并将其作为POST请求发送到action URL。另外下面，我们创建了一个新的opener，增加处理cookie，并添加标题为好，给我们一个稍微强大的揭幕战执行请求）的能力：

import cookielib 
import urllib 
import urllib2 


# Store the cookies and create an opener that will hold them 
cj = cookielib.CookieJar() 
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) 

# Add our headers 
opener.addheaders = [('User-agent', 'RedditTesting')] 

# Install our opener (note that this changes the global opener to the one 
# we just made, but you can also just call opener.open() if you want) 
urllib2.install_opener(opener) 

# The action/ target from the form 
authentication_url = 'https://ssl.reddit.com/post/login' 

# Input parameters we are going to send 
payload = { 
    'op': 'login-main', 
    'user': '<username>', 
    'passwd': '<password>' 
    } 

# Use urllib to encode the payload 
data = urllib.urlencode(payload) 

# Build our Request object (supplying 'data' makes it a POST) 
req = urllib2.Request(authentication_url, data) 

# Make the request and read the response 
resp = urllib2.urlopen(req) 
contents = resp.read()

注意，这可以得到更为复杂 - 例如，您也可以使用GMail执行此操作，但您需要提取每次都会更改的参数（例如参数GALX）。再次，不知道这是你想要的，但希望它有帮助。

来源

2012-12-19 15:23:17 RocketDonkey

这是/令人惊叹/，谢谢！几乎正是我想要的，现在我知道我还需要阅读更多的内容。完善！ – tommo

@tommo没问题我的朋友 - 我记得当我尝试将那些东西排除在外时，经历了完全相同的问题线:)祝你好运！ – RocketDonkey

谢谢队友:)我其实还有一个问题，我找不到答案，如果你不介意回答 - 为什么你在[[（'User-agent'中使用[（）]括号， 'RedditTesting'）]“？在文档中只有正常的括号。 – tommo

使用urllib2登录到网站 - Python 2.7

回答

相关问题