2011-01-31 50 views
2

我想从EMR本地文件系统上传一个目录到s3作为压缩文件。将Elastic MapReduce中的压缩文件上传到S3

有没有比我目前使用的方法更好的方法来解决这个问题?

是否可以将ZipOutputStream作为Reducer输出返回?

感谢

zipFolderAndUpload("target", "target.zip", "s3n://bucketpath/"); 


static public void zipFolderAndUpload(String srcFolder, String zipFile, String dst) throws Exception { 

    //Zips a directory 
    FileOutputStream fileWriter = new FileOutputStream(zipFile); 
    ZipOutputStream zip = new ZipOutputStream(fileWriter); 
    addFolderToZip("", srcFolder, zip); 
    zip.flush(); 
    zip.close(); 

    // Copies the zipped file to the s3 filesystem, 
    InputStream in = new BufferedInputStream(new FileInputStream(zipFile)); 
    Configuration conf = new Configuration(); 
    FileSystem fs = FileSystem.get(URI.create(dst+zip), conf); 
    OutputStream out = fs.create(new Path(dst+zip)); 
    IOUtils.copyBytes(in, out, 4096, true); 

} 

static private void addFileToZip(String path, String srcFile, ZipOutputStream zip) throws Exception { 

    File folder = new File(srcFile); 
    if (folder.isDirectory()) { 
     addFolderToZip(path, srcFile, zip); 
    } else { 
     byte[] buf = new byte[1024]; 
     int len; 
     FileInputStream in = new FileInputStream(srcFile); 
     zip.putNextEntry(new ZipEntry(path + "/" + folder.getName())); 
     while ((len = in.read(buf)) > 0) { 
      zip.write(buf, 0, len); 
     } 
    } 
} 

static private void addFolderToZip(String path, String srcFolder, ZipOutputStream zip) throws Exception { 
    File folder = new File(srcFolder); 

    for (String fileName : folder.list()) { 
     if (path.equals("")) { 
      addFileToZip(folder.getName(), srcFolder + "/" + fileName, zip); 
     } else { 
      addFileToZip(path + "/" + folder.getName(), srcFolder + "/" + fileName, zip); 
     } 
    } 
} 

回答

4

你正在服用的方法看起来不错。如果您发现它太慢是因为它是单线程的,那么您可以创建自己的Hadoop OutputFormat实现,该实现写入压缩文件。

需要注意的一件事是,Java SE的ZipOutputFormat实现不支持Zip64,这意味着它不支持大于4GB的ZIP文件。还有其他的ZIP实现,比如TrueZIP。

+0

多数民众赞成在伟大的感谢提示。 – patrickandroid 2011-02-10 08:39:52