2013-10-15 29 views
2

我有很多不同的文件* .doc,* .pdf等等。我想用mapReduce来处理它们。用括号处理文件时出现Hadoop错误

我把它们放在HDFS中,然后用Hue启动Java MapReduce程序。

如果文件格式正确并且名称中没有括号“(){} []”,则一切正常。

但如果有一个文件OPN_last_[age.PDF

我得到这个错误:

Failing Oozie Launcher, Main class [distr.fors.ru.Index], main() threw exception, Illegal file pattern: Unclosed character class near index 17 
    OPN_last_[age.PDF 
    ^
    java.io.IOException: Illegal file pattern: Unclosed character class near index 17 
    OPN_last_[age.PDF 
    ^
    at org.apache.hadoop.fs.GlobFilter.init(GlobFilter.java:70) 
    at org.apache.hadoop.fs.GlobFilter.<init>(GlobFilter.java:49) 
    at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1670) 
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1627) 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:211) 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) 
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063) 
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080) 
    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) 
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992) 
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:396) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) 
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) 
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) 
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) 
    at distr.fors.ru.Index.run(Index.java:78) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
    at distr.fors.ru.Index.main(Index.java:39) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
    at java.lang.reflect.Method.invoke(Method.java:597) 
    at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:495) 
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:396) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) 
    at org.apache.hadoop.mapred.Child.main(Child.java:262) 
    Caused by: java.util.regex.PatternSyntaxException: Unclosed character class near index 17 
    OPN_last_[age.PDF 
    ^
    at org.apache.hadoop.fs.GlobPattern.error(GlobPattern.java:167) 
    at org.apache.hadoop.fs.GlobPattern.set(GlobPattern.java:151) 
    at org.apache.hadoop.fs.GlobPattern.<init>(GlobPattern.java:42) 
    at org.apache.hadoop.fs.GlobFilter.init(GlobFilter.java:66) 
    ... 32 more 

如果有这样的文件:{} 2011-01-27(3769330).PDF

我得到这样的错误:

Input Pattern hdfs://fd-bigdata.distr.fors.ru:8020/{2011-01-27} (3769330).pdf matches 0 files 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) 
    t org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) 
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063) 
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080) 
    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) 
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992) 
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:396) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) 
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) 
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) 
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) 
    at distr.fors.ru.Index.run(Index.java:76) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
    at distr.fors.ru.Index.main(Index.java:37) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
    at java.lang.reflect.Method.invoke(Method.java:597) 
    at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:495) 
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:396) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) 
    at org.apache.hadoop.mapred.Child.main(Child.java:262) 

我真的需要处理这些文件。我能做些什么来解决这些问题?

P.S.我正在使用最新的CDH 4.4.0。

回答

1

为了解决在Java中,你应该带有双反斜线转义特殊字符“\”:

'[' => '\\[' 
'}' => '\\}' 

这在Java的工作对我来说,在猪,在Oozie的。希望它也能解决你的问题。

+0

谢谢,它有帮助。 – Andrey