要扩大@jtravaglini,使用DistributedCache
纱线的首选方式/ MapReduce的2如下:
在你的驱动程序,使用Job.addCacheFile()
public int run(String[] args) throws Exception {
Configuration conf = getConf();
Job job = Job.getInstance(conf, "MyJob");
job.setMapperClass(MyMapper.class);
// ...
// Mind the # sign after the absolute file location.
// You will be using the name after the # sign as your
// file name in your Mapper/Reducer
job.addCacheFile(new URI("/user/yourname/cache/some_file.json#some"));
job.addCacheFile(new URI("/user/yourname/cache/other_file.json#other"));
return job.waitForCompletion(true) ? 0 : 1;
}
而且在您的Mapper/Reducer,覆盖setup(Context context)
方法:
@Override
protected void setup(
Mapper<LongWritable, Text, Text, Text>.Context context)
throws IOException, InterruptedException {
if (context.getCacheFiles() != null
&& context.getCacheFiles().length > 0) {
File some_file = new File("./some");
File other_file = new File("./other");
// Do things to these two files, like read them
// or parse as JSON or whatever.
}
super.setup(context);
}
谢谢 - 我假定我需要使用更新的'mapreduce' API而不是'mapred',否则'JobContext'对象不会提供给映射器。 – DNA
是的,你是对的。 – user2371156
我认为'getLocalCacheFiles()'被弃用,但'getCacheFiles()'确定 - 虽然返回的URI不是路径。 – DNA