2013-04-28 37 views
1

我有一个大的数据文件,看起来像这样插入新表:排序,然后使用蜂巢

1 6 
    1 6 
    2 7 
    3 2 
    3 6 
    1 7 
    1 9 
    2 9 
    1 5 
    3 9 
    3 1 
    2 8 

我想小组第一列中的数据,找到第2列平均每个第一列值,然后按第二列平均值对这些分组进行排序。所以输出应该是:

2 8 
    1 6.6 
    3 4.5 

我的代码看起来像现在这种权利,并不起作用:

CREATE EXTERNAL TABLE as (a STRING, b INT) 
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
    LOCATION 's3n://myfolder/hive'; 

    CREATE EXTERNAL TABLE output(a STRING, avgb DOUBLE) 
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
    LOCATION 's3n://myfolder/hive'; 

    load data inpath "s3n://myfolder/file.txt" into TABLE as; 
    insert overwrite output select a, avg(b) from as group by a order by avg(b) DESC limit 1000; 

我应该注意的是,以下的工作,但自己是不是与工作为了通过并插入在SQL对我的工作步骤:

select a, avg(b) from as group by a; 

当我尝试:

select a, avg(b) from as group by a order by avg(b); 

我得到“FAILED:语义分析错误:行1:66无效的表别名或列引用'b':(可能的列名是:_col0,_col1)。

回答

3

刚刚转移出来的子查询:

select a 
from (select a, avg(b) as avgb from as group by a) as t 
order by avgb;