2015-12-21 48 views
0

我在选择另外两个表中的数据的hive表中进行插入操作。我使用UNION ALL进行Hive插入查询优化

第一个查询:

"insert overwrite table table1 
select uniod.col1,uniod.col2 from (
select col1, col2 from table2 
UNION ALL 
select col1, col2 from table3 
) uniod; " 

第二个查询:

"insert overwrite table table1 
select col1, col2 from table2 
UNION ALL 
select col1, col2 from table3 
; " 

我的问题:在性能或一个方面这两个查询相同的是比其他更好吗?

+0

table1的字段是“col1”和“col2” –

+0

这些查询是相同的。您可以使子查询并行运行。这会提高性能。 Set set hive.exec.parallel = true;和hive.exec.parallel.thread.number = 8(允许的最大并行线程数量) – leftjoin

回答

0

最好的方法是检查解释计划。两者都产生了相同的解释计划。即使插入语句也以类似的方式运行。它可能在早期版本的配置单元中有所不同。

explain select * from (
select * from departments 
union all 
select * from departments 
) q; 

STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 

STAGE PLANS: Stage: Stage-1 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      alias: departments 
      Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
      Select Operator 
       expressions: department_id (type: int), department_name (type: string) 
       outputColumnNames: _col0, _col1 
       Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
       Union 
       Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
       Select Operator 
        expressions: _col0 (type: int), _col1 (type: string) 
        outputColumnNames: _col0, _col1 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        File Output Operator 
        compressed: false 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        table: 
         input format: org.apache.hadoop.mapred.TextInputFormat 
         output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
      TableScan 
      alias: departments 
      Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
      Select Operator 
       expressions: department_id (type: int), department_name (type: string) 
       outputColumnNames: _col0, _col1 
       Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
       Union 
       Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
       Select Operator 
        expressions: _col0 (type: int), _col1 (type: string) 
        outputColumnNames: _col0, _col1 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        File Output Operator 
        compressed: false 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        table: 
         input format: org.apache.hadoop.mapred.TextInputFormat 
         output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 

    Stage: Stage-0 
    Fetch Operator 
     limit: -1 
     Processor Tree: 
     ListSink 

Time taken: 0.124 seconds, Fetched: 55 row(s) 

explain 
select * from departments 
union all 
select * from departments 
; 

STAGE DEPENDENCIES: 
    Stage-1 is a root stage 
    Stage-0 depends on stages: Stage-1 

STAGE PLANS: 
    Stage: Stage-1 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      alias: departments 
      Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
      Select Operator 
       expressions: department_id (type: int), department_name (type: string) 
       outputColumnNames: _col0, _col1 
       Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
       Union 
       Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
       Select Operator 
        expressions: _col0 (type: int), _col1 (type: string) 
        outputColumnNames: _col0, _col1 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        File Output Operator 
        compressed: false 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        table: 
         input format: org.apache.hadoop.mapred.TextInputFormat 
         output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
      TableScan 
      alias: departments 
      Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
      Select Operator 
       expressions: department_id (type: int), department_name (type: string) 
       outputColumnNames: _col0, _col1 
       Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
       Union 
       Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
       Select Operator 
        expressions: _col0 (type: int), _col1 (type: string) 
        outputColumnNames: _col0, _col1 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        File Output Operator 
        compressed: false 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        table: 
         input format: org.apache.hadoop.mapred.TextInputFormat 
         output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 

    Stage: Stage-0 
    Fetch Operator 
     limit: -1 
     Processor Tree: 
     ListSink 

Time taken: 0.064 seconds, Fetched: 55 row(s) 
+0

太棒了。谢谢 –