2014-03-06 177 views
3

我想使用机械化登录到页面并检索一些信息。但是,无论我尝试验证它只是失败,错误代码HTTP 401,你可以看到如下:使用python mechanize以NTLM身份验证登录页面

r = br.open('http://intra') 
File "bui...e\_mechanize.py", line 203, in open 
File "bui...g\mechanize\_mechanize.py", line 255, 
in _mech_openmechanize._response.httperror_seek_wrapper: HTTP Error 401: Unauthorized 

这是我到目前为止的代码:

import mechanize 
import cookielib 

# Browser 
br = mechanize.Browser() 

# Cookie Jar 
cj = cookielib.LWPCookieJar() 
br.set_cookiejar(cj) 

# Browser options 
br.set_handle_equiv(True) 
# br.set_handle_gzip(True) 
br.set_handle_redirect(True) 
br.set_handle_referer(True) 
br.set_handle_robots(False) 

# Follows refresh 0 but not hangs on refresh > 0 
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1) 

# If the protected site didn't receive the authentication data you would 
# end up with a 410 error in your face 
br.add_password('http://intra', 'myusername', 'mypassword') 

# User-Agent (this is cheating, ok?) 
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] 
# Open some site, let's pick a random one, the first that pops in mind: 
# r = br.open('http://google.com') 
r = br.open('http://intra') 
html = r.read() 

# Show the source 
print html 

我在做什么错?使用例如http://intra(内部页面)访问铬,它弹出打开一个窗口,并要求输入用户名/密码一次,然后一切都很好。

enter image description here

回答

2

后吨安全研究的,我设法找出背后的原因是:

的这弹开看起来像这样的对话。

找到所有的网站使用所谓NTLM authentication,这是不支持机械化。 这可以帮助找出一个网站的身份验证机制:

wget -O /dev/null -S http://www.the-site.com/ 

因此,代码被修改一点点:

import sys 
import urllib2 
import mechanize 
from ntlm import HTTPNtlmAuthHandler 

print("LOGIN...") 
user = sys.argv[1] 
password = sys.argv[2] 
url = sys.argv[3] 

passman = urllib2.HTTPPasswordMgrWithDefaultRealm() 
passman.add_password(None, url, user, password) 
# create the NTLM authentication handler 
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman) 

browser = mechanize.Browser() 
handlersToKeep = [] 

for handler in browser.handlers: 
    if not isinstance(handler, 
    (mechanize._http.HTTPRobotRulesProcessor)): 
     handlersToKeep.append(handler) 

browser.handlers = handlersToKeep 
browser.add_handler(auth_NTLM) 

response = browser.open(url) 
response = browser.open("http://www.the-site.com") 
print(response.read()) 

最后机械化需要修补,如前所述here

--- _response.py.old 2013-02-06 11:14:33.208385467 +0100 
+++ _response.py 2013-02-06 11:21:41.884081708 +0100 
@@ -350,8 +350,13 @@ 
      self.fileno = self.fp.fileno 
     else: 
      self.fileno = lambda: None 
-  self.__iter__ = self.fp.__iter__ 
-  self.next = self.fp.next 
+ 
+  if hasattr(self.fp, "__iter__"): 
+   self.__iter__ = self.fp.__iter__ 
+   self.next = self.fp.next 
+  else: 
+   self.__iter__ = lambda self: self 
+   self.next = lambda self: self.fp.readline() 

    def __repr__(self): 
     return '<%s at %s whose fp = %r>' % (
0

@theAlse:您是否需要单独处理会话cookie?我用你的方法来对SSO服务器进行身份验证,但是当我在第二个“browser.open”调用访问主站点(ServiceNow)时,我仍然遇到401:未经授权的错误。

我在机械化_response.py文件上添加了一条调试消息以显示正在访问的URL,我惊讶地发现有一个辅助SSO服务器。

$ python s3.py 
LOGIN... 
[_DEBUG] Visiting https://sso.intra.client.com 
[_DEBUG] Got past the first open statement. 
[_DEBUG] Visiting https://clienteleitsm.service-now.com 
[_DEBUG] Visiting <Request for https://ssointra.web.ipc.us.client.com/ssofedi/public/saml2sso?SAMLRequest=lVLB--snipped--&RelayState=https%3a%2f%2fclienteleitsm.service-now.com%2fnavpage.do> 
[_DEBUG] Visiting <Request for https://ssointra.web.ipc.us.client.com/ssofedi/redirectjsp/FederationRedirectWDA.jsp?SAMLRequest=lVLBb--snipped--&SMPORTALURL=https%3A%2F%2Fssointra.web.ipc.us.client.com%2Fssofedi%2Fpublic%2Fsaml2sso> 
[_DEBUG] Visiting <Request for https://ssointra.web.ipc.us.client.com/SSOI/ntlm/RedirectToWDA.jsp?TYPE=33554433&REALMOID=--snipped--%3D%26RelayState%3dhttps$%3a$%2f$%2fclienteleitsm%2eservice-now%2ecom$%2fnavpage%2edo%26SMPORTALURL%3dhttps$%3A$%2F$%2Fssointra%2eweb%2eipc%2eus%2eclient%2ecom$%2Fssofedi$%2Fpublic$%2Fsaml2sso> 
[_DEBUG] Visiting <Request for https://ssointra.web.ipc.us.client.com/SSOI/ntlm/WDAProtectedPage.jsp?Target=HTTPS://ssointra.--snipped--&RelayState=https%3A%2F%2Fclienteleitsm.service-now.com%2Fnavpage.do&SMPORTALURL=https%3A%2F%2Fssointra.web.ipc.us.client.com%2Fssofedi%2Fpublic%2Fsaml2sso> 
[_DEBUG] Visiting <Request for https://sso.intra.client.com/siteminderagent/ntlm/creds.ntc?CHALLENGE=&SMAGENTNAME=--snipped--https$%3A$%2F$%2Fssointra%2eweb%2eipc%2eus%2eclient%2ecom$%2Fssofedi$%2Fpublic$%2Fsaml2sso> 

[Client-specific page about invalid username and password credential combination follows] 
<HTML> 
... 
</HTML> 

我已经在第三个调试行之后剪掉了很多重定向URL。随机字符串实际上是独一无二的,因为当我将它们放入浏览器时,我得到一个错误页面。但是,如果我在IE浏览器中执行此操作,我甚至不会看到重定向页面。

谢谢。