2016-07-05 47 views
1

我试图通过Python中的urllib2访问twitter上的受保护页面(例如我自己的列表),但是此代码始终会将我发送回登录页面。任何想法,为什么?无法使用urllib2访问登录页面

(我知道我可以使用Twitter的API和东西,但想在一般学习如何做到这一点)

感谢, 罗伊


代码:

url = "https://twitter.com/login" 
protectedUrl = "https://twitter.com/username/likes 

USER = "myTwitterUser" 
PASS = "myTwitterPassword" 

cj = cookielib.CookieJar() 
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) 
opener.addheaders = [('User-Agent', 'Mozilla/5.0'), ("Referer", "https://twitter.com")] 

hdr = {'User-Agent': 'Mozilla/5.0', "Referer":"https://twitter.com"} 
req = urllib2.Request(url, headers=hdr) 
page = urllib2.urlopen(req) 

html = page.read() 
s = BeautifulSoup(html, "lxml") 
AUTH_TOKEN = s.find(attrs={"name": "authenticity_token"})["value"] 

login_details = {"session[username_or_email]": USER, 
       "session[password]": PASS, 
       "remember_me": 1, 
       "return_to_ssl": "true", 
       "scribe_log": "", 
       "redirect_after_login": "/", 
       "authenticity_token": AUTH_TOKEN 
       } 

login_data = urllib.urlencode(login_details) 
opener.open(url, login_data) 
resp = opener.open(protectedUrl) 
print resp.read() 

回答

0

您需要发布到正确的网址"https://twitter.com/sessions",当您发出初始请求获得012时,使用opener也是必不可少的代替page = urllib2.urlopen(req)所以page = opener.open(req)所以我们获得了饼干需要:

​​

如果我们需要运行我的Twitter账号一个不喜欢的代码:

In [72]: login_details = {"session[username_or_email]": USER, 
    ....:     "session[password]": PASS, 
    ....:     "remember_me": 1, 
    ....:     "redirect_after_login": "/", 
    ....:     "authenticity_token": AUTH_TOKEN 
    ....:     } 

In [73]: # encode form data 

In [74]: login_data = urllib.urlencode(login_details) 

In [75]: r = opener.open("https://twitter.com/sessions", login_data) 

In [76]: # get likes now we have logged in 

In [77]: resp = opener.open(likes.format(USER)) 

In [78]: soup = BeautifulSoup(resp.read(),"lxml") 

In [79]: print(soup.select_one("p.empty-text")) 
<p class="empty-text"> 
     You haven't liked any Tweets yet. 

    </p> 

你可以看到,我们得到成功到我们想要的页面。

做同样的用requests.Session()对象,代码少了很多事情:

USER = "username" 
PASS = "pass" 
post = "https://twitter.com/sessions" 
likes = "https://twitter.com/{}/likes" 
url = "https://twitter.com" 

data = {"session[username_or_email]": USER, 
     "session[password]": PASS, 
     "scribe_log": "", 
     "redirect_after_login": "/", 
     "remember_me": "1"} 

post = "https://twitter.com/sessions" 

with requests.Session() as s: 
    r = s.get(url) 
    soup = BeautifulSoup(r.content, "lxml") 
    AUTH_TOKEN = soup.select_one("input[name=authenticity_token]")["value"] 
    data["authenticity_token"] = AUTH_TOKEN 
    r = s.post(post, data=data) 
    soup = BeautifulSoup(r.content) 
    print(s.get("https://twitter.com/{}/likes".format(USER)).content) 
-1

从我的经历像这样的网站,你需要使用完整的HTTP标头包括:

  • 接受
  • 接受编码
  • 接受语言
  • 引荐
  • 升级不安全,请求
  • ...
  • 用户代理

从标题只删除的cookie。

您还需要创建会话并处理cookie,因为twitter必须像Facebook一样。我个人更喜欢使用“请求”,因为您可以创建会话并轻松使用cookie。

你可以做这样的事情:

import requests 
form time import sleep 

hd = {'h11': 'h12', 'h21': 'h22', 'h31': 'h32'} 
usrdata = {'user': USER, 'pass': PASS} 

sess = requests.Session() 
req = sess.get('http://www.twitter.com') ## to start session 
sleep(1) 
req = sess.post('https://twitter.com/sessions', data=usrdata, headers=hd) 

希望这有助于。