2012-07-17 48 views
0

我使用代理登录安全站点,并希望能够将所有文件和文件夹下载到本地光盘。这是我迄今为止。使用java从网站下载多个文件

编辑 - **目前下面的代码将从给定的根目录开始,并下载所有子目录中的所有文件...很酷:)但它不重复我所需要的文件夹结构。请帮忙吗? **编辑

首先我拿到了4个参数

1)目录我想下载 2)安全登录 3的用户名的URL)PSW(等都可以在Linux上CMD线使用)我在哪里想保存我的本地磁盘

 public class ApacheUrl4 
{ 
// this is the entry point for what I want the instase of the class to do 
    public static void main(String args[]) throws Exception { 

     String url = args[0]; 
     final String username = args[1]; 
     final String password1 = args[2]; 
     String directory = args[3]; 

     checkArguments(args); 

     ApacheUrl4 max = new ApacheUrl4(); 
     max.process(url, username, password1, directory); 

    } 
    public void process (String url, String username1, String password1, String directory) throws Exception { 

     final char[] password = password1.toCharArray(); 
     final String username = username1; 
     Authenticator.setDefault(new Authenticator(){ 
       protected PasswordAuthentication getPasswordAuthentication(){ 
       PasswordAuthentication p=new PasswordAuthentication(username , password); 
       return p; 
       } 
      }); 


     BufferedInputStream in = null; 
     BufferedInputStream in2 = null; 
     FileOutputStream fout = null; 
    // proxy 
     String proxyip = "000.000.000" ; 
     int proxyport = 8080; 
     Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyip, proxyport)); 
    // URL connection to file 
     URL file = new URL(url); 
     URLConnection connection = file.openConnection(proxy);  
     ((HttpURLConnection)connection).getResponseCode(); 
     int reponsecode = ((HttpURLConnection)connection).getResponseCode(); 
     System.out.println("response code " + reponsecode); 


     if (reponsecode == HttpURLConnection.HTTP_FORBIDDEN){ 
      System.out.println("Invalid username or psw"); 
      return; 
     } 
     if (reponsecode != HttpURLConnection.HTTP_OK){ 
      System.out.println("Unable to find response"); 
      return; 
     } 





     //Save the file into the chosen folder 
     in = new BufferedInputStream(connection.getInputStream()); 

     //Create instance of DocumentBuilderFactory 
     DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); 
     //Get the DocumentBuilder 
     DocumentBuilder docBuilder = factory.newDocumentBuilder(); 
     //Using existing XML Document 
     Document doc = docBuilder.parse(in); 

     //create the root element 
     Element root = doc.getDocumentElement(); 
     NodeList nodeList = root.getElementsByTagName("li"); 


     for(int i=0; i<nodeList.getLength(); i++){ 
      Node childNode = nodeList.item(i); 
      if (childNode.getTextContent().contains("/")) { 


      // System.out.println(url + childNode.getTextContent()); 
       process(url + childNode.getTextContent(), username, password1, directory);       

     } 

    if (childNode.getTextContent().contains(".") && !childNode.getTextContent().contains("..")) { 


      String textcon = url + childNode.getTextContent(); 
      System.out.println("aaa " + textcon); 

      if (url.endsWith("/")) { 
       System.out.println("ends with a /");  
      } 

      textcon = textcon.replace(" ", "%20"); 
      URL file2 = new URL(textcon); 

      String[] urlparts = textcon.split("/"); 
      int urllength = urlparts.length; 
      String lastarray = urlparts[urllength-2]; 
      System.out.println("last array " + lastarray); 


      URLConnection connection2 = file2.openConnection(proxy);   
      in2 = new BufferedInputStream(connection2.getInputStream()); 
      String test2 = childNode.getTextContent(); 
      System.out.println("eeee " + childNode.getTextContent()); 

      String filename = (directory + test2); 
       File f=new File(filename); 
        if(f.isDirectory()) 
        continue; 





       //InputStream inputStream= new FileInputStream("InputStreamToFile.java"); 
       OutputStream out=new FileOutputStream(f); 
       byte buf[]=new byte[1024]; 
       int len; 
       while((len=in2.read(buf))>0) 
       out.write(buf,0,len); 
       out.close(); 
       in2.close(); 


     } 
    } 
} 




    // this is part of the validation of arguments provided by user 
    private static void checkArguments(String[] args) { 
     while (args.length < 4 || args[0].isEmpty() || args.length > 4) { 
       System.out.println("Please specify five arguments in the following format \n " + 
       " URL USERNAME PASWORD FILEPATH FILENAME " + 
       "EG: \"java helloW http://www.google.com user_name password C:\\path/dir/ filename.exe\" "); 
       System.exit(1); 
     } 
    } 
} 
+0

您读取的服务器是否允许目录浏览?我的意思是,如果您使用浏览器访问它,您是否看到目录列表? – 2012-07-17 11:26:49

回答

0

上为了下载目录中的文件的文件 4)目录,你首先需要的目录列表。这是由服务器自动生成的,如果允许的话。首先,使用浏览器检查这台服务器是否属于这种情况。

然后,您将需要解析列表页面,并下载每个网址。坏消息是这些页面没有标准。好消息是,大多数互联网都托管在apache或IIS上,所以如果你可以管理这两个,你已经覆盖了很多部分。

您可能只是将文件解析为xml(xhtml)并使用xpath恢复所有url。

+0

谢谢Joeri,这帮助我走上了正确的道路。随着文件的下载,我只需要一点帮助即可创建文件夹。目前下面的代码将从给定的根目录开始,并下载所有子目录中的所有文件......非常酷:)但它不重复我所需要的文件夹结构。请帮忙吗? – 2012-08-08 15:44:21

+1

只需制作一个像'downloadContent(URL源,文件目标)'的方法。如果在directorylisting中你想要一个子文件夹,则对'downloadContent(source +“/”+ folderName,new File(target,folderName))'执行递归调用。 – 2012-08-09 08:43:49