Java Web爬虫库

我想为实验制作基于Java的网络爬虫。我听说如果这是你第一次使用Java制作Web爬虫，但是，我有两个重要的问题。Java Web爬虫库

我的程序将如何“访问”或“连接”到网页？请简单说明一下。（我理解从硬件到软件抽象层的基础知识，这里我对Java抽象有兴趣）
我应该使用哪些库？我会假设我需要一个连接到网页的库，一个HTTP/HTTPS协议库和一个HTML解析库。

2012-07-01 CodeKingPlusPlus

这是您的程序如何访问或连接到网页。

URL url; 
    InputStream is = null; 
    DataInputStream dis; 
    String line; 

    try { 
     url = new URL("http://stackoverflow.com/"); 
     is = url.openStream(); // throws an IOException 
     dis = new DataInputStream(new BufferedInputStream(is)); 

     while ((line = dis.readLine()) != null) { 
      System.out.println(line); 
     } 
    } catch (MalformedURLException mue) { 
     mue.printStackTrace(); 
    } catch (IOException ioe) { 
     ioe.printStackTrace(); 
    } finally { 
     try { 
      is.close(); 
     } catch (IOException ioe) { 
      // nothing to see here 
     } 
    }

这将下载html页面的源代码。

对于HTML解析看到this

而且看看jSpider和jsoup

来源

2012-07-01 13:51:35

那么，这是否从一个页面中提取信息，或者干脆转到页？我正在尝试编写一个抓取工具，它将接收用户输入的信息，转到maps.google.com，插入地址并获取路由时间和路由长度，然后将其返回到程序中。这可能吗？ – Ungeheuer

@Adrian看看谷歌地图api：https://developers.google.com/maps/documentation/distance-matrix/start –