2017-07-15 22 views
0

我的表是非常大,但小文档片断会是这样:SQL - 分区一列,有些字段类型

---------+---+----------+--------+------------+--- 
|distance|qtt|deliver_by| store |deliver_time| ... 
+--------+---+----------+--------+------------|--- 
| 11 | 1| pa  | store_a| 1111  | 
| 123 | 2| pa  | store_a| 1112  | 
| 33 | 3| pb  | store_a| 1113  | 
| 33 | 2| pa  | store_b| 2221  | 
| 44 | 2| pb  | store_b| 2222  | 
| 5 | 2| pc  | store_b| 2223  | 
| 5 | 2| pc  | store_b| 2224  | 
| 6 | 5| pb  | store_c| 3331  | 
| 7 | 5| pb  | store_c| 3332  | 
----------------------------------------------.... 

多个商店只有3种可能提供(deliver_by:papbpc),其在特定时间递送产品。考虑deliver_time时间戳。

我要选择整个表格和增加6分新列分钟最大在商店每deliver_by时间。 一个商店可以由3个交付(pa,pb,pc)中的任何一个提供服务,但不是必需的。

我可以完成几乎所有的正确结果,与下面的查询中,问题是,在情况下deliver_by的pX不存在,我没有得到一个空而是最小/ max在商店交货。

我真的想用一个分区,所以我写了这个以添加新的最小值/最大值列:

select 
    min(deliver_time) over (partition by store, deliver_by='pa') as as min_time_sd_pa 
, max(deliver_time) over (partition by store, deliver_by='pa') as as min_time_sd_pa 

, min(deliver_time) over (partition by store, deliver_by='pb') as as min_time_sd_pb 
, max(deliver_time) over (partition by store, deliver_by='pb') as as min_time_sd_pb 

, min(deliver_time) over (partition by store, deliver_by='pc') as as min_time_sd_pc 
, max(deliver_time) over (partition by store, deliver_by='pc') as as min_time_sd_pc 

, distance, qtt, .... 
from mytable 

正确的输出将

min_time_sd_pa|max_time_sd_pa|min_time_sd_pb|max_time_sd_pb|min_time_sd_pc|max_time_sd_pc|distance|qtt|deliver_by| store |deliver_time 
--------------+--------------+--------------+--------------+--------------+--------------+--------+---+----------+--------+------------ 
    1111  | 1112  | 1113  | 1113  | null  | null  | 11 | 1| pa  | store_a| 1111 
    1111  | 1112  | 1113  | 1113  | null  | null  | 123 | 2| pa  | store_a| 1112 
    1111  | 1112  | 1113  | 1113  | null  | null  | 33 | 3| pb  | store_a| 1113 
    2221  | 2221  | 2222  | 2222  | 2223  | 2224  | 33 | 2| pa  | store_b| 2221 
    2221  | 2221  | 2222  | 2222  | 2223  | 2224  | 44 | 2| pb  | store_b| 2222 
    2221  | 2221  | 2222  | 2222  | 2223  | 2224  | 5 | 2| pc  | store_b| 2223 
    2221  | 2221  | 2222  | 2222  | 2223  | 2224  | 5 | 2| pc  | store_b| 2224 
    null  | null  | null  | null  | 3331  | 3332  | 6 | 5| pb  | store_c| 3331 
    null  | null  | null  | null  | 3331  | 3332  | 7 | 5| pb  | store_c| 3332 
--------------------------------------------------------------------------------------------------------------------------------------- 

什么在我的select min(..) over..声明中缺少,或者我怎么能以最简单的方式完成这个结果? 我正在使用Hive QL,但我想这是最通用的SQL DBMS

感谢

回答

2

您可以在minmax一个case表达做到这一点。

select 
min(case when deliver_by='pa' then deliver_time end) over (partition by store) as min_time_sd_pa 
,max(case when deliver_by='pa' then deliver_time end) over (partition by store) as max_time_sd_pa 
,min(case when deliver_by='pb' then deliver_time end) over (partition by store) as min_time_sd_pb 
,max(case when deliver_by='pb' then deliver_time end) over (partition by store) as max_time_sd_pb 
,min(case when deliver_by='pc' then deliver_time end) over (partition by store) as min_time_sd_pc 
,max(case when deliver_by='pc' then deliver_time end) over (partition by store) as max_time_sd_pc 
,m.* 
from mytable m 
+0

是的,它会...... –

相关问题