2013-04-13 191 views
0

我想分析一个Apache日志,目标是找出所有用户代理及其使用率。当结果包含每个useragent,计数和百分比时,以下程序可以正常工作。当试图按照最常用的顺序进行排序时,程序在最后一行失败。有人可以帮忙吗?猪订单命令失败

logs = LOAD '$LOGS' USING ApacheCombinedLogLoader AS (remoteHost, hyphen, user, time, method, uri, protocol, statusCode, responseSize, referer, userAgent); 

uarows = FOREACH logs GENERATE userAgent; 
total = FOREACH (GROUP uarows ALL) GENERATE COUNT(uarows) as count; 
dump total; 

gpuarows = GROUP uarows BY userAgent; 
result = FOREACH gpuarows { 
     subtotal = COUNT(uarows); 
     GENERATE flatten(group) as ua, subtotal AS SUB_TOTAL, 100*(double)subtotal/(double)total.count AS percentage; 
     }; 
orderresult = ORDER result BY SUB_TOTAL DESC; 
dump orderresult; 

有什么奇怪的是,“转储结果”工作得很好,所以它的ORDER线制造麻烦

错误:

013-04-13 11:33:09,976 [Thread-48] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720 
2013-04-13 11:33:09,976 [Thread-48] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680 
2013-04-13 11:33:09,995 [Thread-48] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0005 
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157) 
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) 
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) 
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) 
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37) 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) 
    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:177) 
    at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:124) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131) 
    ... 6 more 
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0005 
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases orderresult 
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: orderresult[16,14] C: R: 
2013-04-13 11:33:15,286 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 
2013-04-13 11:33:15,286 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0005 has failed! Stop running all dependent jobs 
2013-04-13 11:33:15,287 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 
2013-04-13 11:33:15,287 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 
2013-04-13 11:33:15,288 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion PigVersion UserId StartedAt FinishedAt Features 
1.0.4 0.11.0 dliu 2013-04-13 11:32:27 2013-04-13 11:33:15 GROUP_BY,ORDER_BY 

Some jobs have failed! Stop running all dependent jobs 

Job Stats (time in seconds): 
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs 
job_local_0002 1 1 n/a n/a n/a n/a n/a n/a 1-18,logs,total,uarows MULTI_QUERY,COMBINER  
job_local_0003 1 1 n/a n/a n/a n/a n/a n/a gpuarows,result GROUP_BY,COMBINER 
job_local_0004 1 1 n/a n/a n/a n/a n/a n/a orderresult SAMPLER 

Failed Jobs: 
JobId Alias Feature Message Outputs 
job_local_0005 orderresult ORDER_BY Message: Job failed! Error - NA file:/tmp/temp265162785/tmp896004388, 

Input(s): 
Successfully read 0 records from: "file:///home/dliu/ApacheLogAnalysisWithPig/access.log" 

Output(s): 
Failed to produce result in "file:/tmp/temp265162785/tmp896004388" 

Counters: 
Total records written : 0 
Total bytes written : 0 
Spillable Memory Manager spill count : 0 
Total bags proactively spilled: 0 
Total records proactively spilled: 0 

Job DAG: 
job_local_0002 -> job_local_0003, 
job_local_0003 -> job_local_0004, 
job_local_0004 -> job_local_0005, 
job_local_0005 


2013-04-13 11:33:15,291 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs 
2013-04-13 11:33:15,297 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias orderresult 
Details at logfile: /home/dliu/ApacheLogAnalysisWithPig/pig_1365823931459.log 
+0

你是怎么开始的猪? – Frederic

+0

对于在寻找[错误1066:无法打开别名的迭代器]时发现此帖子的人(http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-在猪通用解决方案)这里是[通用解决方案](http://stackoverflow.com/a/34495086/983722)。 –

回答

1

请检查你不已经提交/ tmp/temp265162785/tmp896004388 对于不同的任务,您可以使用相同的文件\目录。

2

确保两两件事:

1)运行猪在本地模式:猪-x本地 2)设置或者PIG_HOME或PIG_INSTALL环境变量指向猪的安装目录

+0

我在ubuntu上遇到了同样的问题......设置pig -x本地修复了它。 – hba