2013-12-12 163 views
0

除了我知道它要求从服务器读取网页之外,我对理解HTTP GET请求的概念有些困难。今天,我写了一个类,试图使用HTTP GET请求来访问网页上的HTML材料。让我包括类,并解释我的困惑:正确的HTTP GET请求

import java.io.*; 
import java.net.*; 

public class HTMLFetcher 
{ 
    private static final int PORT = 80; 
    private URL url; 


    public HTMLFetcher(String url) throws Exception // url = http://www.-----.com/birds.html 
    { 
     this.url = new URL(url); 
     fetch(this.url.getHost()); 
    } 

    private String createRequest(URL url) { // Is there a problem with this request? 
     String request = "GET" + "/index.html" + "HTTP/1.1\n"; 
     request += "Host: www.cs.usfca.edu\n"; 
     request += "Connection: close"; 
     request += "\r\n"; 
     return request; 
     } 

    public void fetch(String urlDomain) throws Exception { 

     System.out.println(urlDomain + ":" + PORT); 

     // TODO: create a new socket here for a given urlDomain and a given PORT 
     Socket socket = new Socket(urlDomain, PORT); 

     // TODO: create PrintWriter for the socket's output stream 
     PrintWriter writer = 
       new PrintWriter(new OutputStreamWriter(socket.getOutputStream())); 

     BufferedReader reader = 
       new BufferedReader(new InputStreamReader(socket.getInputStream())); 

     String request = createRequest(urlDomain); // createRequest is complaining  that it is a string and not a URL 
     System.out.println(request); 
     writer.write(request); 
     writer.flush(); 

     StringBuilder string = new StringBuilder(); 
     boolean htmlFound = false; 
     String line; 
     while ((line = reader.readLine()) != null) { 
      if (!htmlFound) { 
       if (line.toLowerCase().startsWith("<html>")) { 
        htmlFound = true; 
       } else { 
        continue; 
       } 
      } 
      System.out.println("This is each line: " + line); 
      string.append(line + "\n"); 
     } 

     reader.close(); 
     writer.close(); 
     socket.close(); 

     //System.out.println(string.toString()); 
     System.out.println("[done]"); 
    } 
    } 

所以基本上我很困惑,我怎么能发送一个字符串urlDomain到的createRequest方法时,它期待一个网址? HTTP请求是否需要createMethod参数?我是否正确设置了请求?

现在它正在输出如下:

www.cs.usfca.edu:80 
GET/index.htmlHTTP/1.1 
Host: www.cs.usfca.edu 
Connection: close 

This is each line: <html><head> 
This is each line: <title>501 Method Not Implemented</title> 
This is each line: </head><body> 
This is each line: <h1>Method Not Implemented</h1> 
This is each line: <p>GET/index.htmlHTTP/1.1 to /index.html not supported.<br /> 
This is each line: </p> 
This is each line: <hr> 
This is each line: <address>Apache/2.2.15 (CentOS) Server at www.cs.usfca.edu Port 80</address> 
This is each line: </body></html> 
[done] 

谢谢您的帮助。请让我知道,如果我可以更具体。谢谢。

+0

为什么你不使用HttpURLConnection http://download.java.net/jdk7/archive/b123/docs/api/java/net/HttpURLConnection.html –

回答

0

据我所知,当网站位于共享主机服务器上时,会使用请求中的主机标头,其中多个域将映射到相同的IP,并且服务器需要标头来标识虚拟服务器请求被路由。所以它总是更好地包含在请求中。

顺便说一下,在当前代码中,请求字符串中没有空格。这就是为什么你得到错误的HTML作为回应。

private String createRequest(String url) { // Is there a problem with this request? 
    String request = "GET " + "/ " + "HTTP/1.1\r\n"; 
    request += "Host: www.cs.usfca.edu\n"; 
    request += "\r\n"; 
    return request; 
} 

此外,不检查这样

if (line.toLowerCase().startsWith("<html>")) 

而是使用

if (line.toLowerCase().startsWith("<html")) 

顺便说一句,你为什么这样做硬盘的方式?改为使用HTTPUrlConnection。

+0

谢谢你注意到这一点。我认为我已经照顾好了。出于某种原因,现在是输出不同的信息:这是每一行: 这是每行: 400错误的请求 这是每行: 这是每行:

错误的请求

这是每行:

您的浏览器发送了此服务器无法理解的请求。
这是每行:

这是每行:
这是每行:
阿帕奇/ 2.2.15(CentOS的)服务器在www.cs.usfca.edu端口80
这是每行: [完成] – Brandon

+0

这可能会有所帮助http://code.joejag.com/2012/how-to-send-a-raw-http-request-via-java/ –

+0

编辑答案。这个对我有用。 –