2
有人可以帮助我找出如何使用HttpWebRequest登录到页面,然后刮一页。正在使用的代码并不仅仅是在登录页面上写出标记,但无法登录......正在尝试登录的网站是基于php的网站。如何使用HttpWebRequest登录到网站
与像Wireshark的工具 // first, request the login form to get the viewstate value
HttpWebRequest webRequest = WebRequest.Create("loginPageUrl") as HttpWebRequest;
StreamReader responseReader = new StreamReader(
webRequest.GetResponse().GetResponseStream()
);
string responseData = responseReader.ReadToEnd();
responseReader.Close();
string postData = String.Format("Username={0}&Password={1}", "user", "pwd");
// have a cookie container ready to receive the forms auth cookie
CookieContainer cookies = new CookieContainer();
// now post to the login form
webRequest = WebRequest.Create("loginPostUrl") as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;
// write the form values into the request message
StreamWriter requestWriter = new StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();
// we don't need the contents of the response, just the cookie it issues
webRequest.GetResponse().Close();
// now we can send out cookie along with a request for the protected page
webRequest = WebRequest.Create("PageToScrapeUrl") as HttpWebRequest;
webRequest.CookieContainer = cookies;
responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
// and read the response
responseData = responseReader.ReadToEnd();
responseReader.Close();
Console.WriteLine(responseData);
Console.ReadKey();
很少有auth的页面是**想要被刮掉的,并且经常违反ToS。更常见的情况是,如果这些数据的目的*是这样使用的,将会有一个编程API。使用API。 –
对于这种情况,你被允许刮:你有检查与提琴手的交通?您必须使用原始页面分析浏览器的成功登录并模拟网页请求。也许有一些其他领域发布到服务器? – Jan
你能给我们网站的网址吗?因为没有登录到网站的银弹(有时网站本身也在改变它 - 当它被修改时),将会更容易看到你错在哪里。 –