我试图从R的密码保护的网站刮数据周围,似乎httr和RCurl包是用密码认证刮(刮还查看了XML包)。刮在R的密码保护的网站
网站我想刮低于(你需要一个免费帐户,以访问完整的页面): http://subscribers.footballguys.com/myfbg/myviewprojections.php?projector=2
这里是我的两次尝试(与我的用户名和“取代“用户名”密码”与我的密码):
#This returns "Status: 200" without the data from the page:
library(httr)
GET("http://subscribers.footballguys.com/myfbg/myviewprojections.php?projector=2", authenticate("username", "password"))
#This returns the non-password protected preview (i.e., not the full page):
library(XML)
library(RCurl)
readHTMLTable(getURL("http://subscribers.footballguys.com/myfbg/myviewprojections.php?projector=2", userpwd = "username:password"))
我已经看过其他相关的帖子(下面的链接),但无法弄清楚如何他们的答案适用于我的情况。
How to use R to download a zipped file from a SSL page that requires cookies
How to webscrape secured pages in R (https links) (using readHTMLTable from XML package)?
Reading information from a password protected site
R - RCurl scrape data from a password-protected site
http://www.inside-r.org/questions/how-scrape-data-password-protected-https-website-using-r-hold
这对我有用。我编辑了内容输出 – jdharrison
酷!我不认为它变得更容易... – Stefan
我测试了两个答案,他们都很好。我选择这个简单。 – dadrivr