例如,我有一个这样的表蜂巢:如何限制SQL中每个字段值的行数?
1 1
1 4
1 8
2 1
2 5
3 1
3 2
,我想只返回第一列的每个唯一值的前两行。我希望这可以限制从Hive传输到MySQL中用于报告目的的数据量。我想单个HiveQL查询给我这个:
1 1
1 4
2 1
2 5
3 1
3 2
例如,我有一个这样的表蜂巢:如何限制SQL中每个字段值的行数?
1 1
1 4
1 8
2 1
2 5
3 1
3 2
,我想只返回第一列的每个唯一值的前两行。我希望这可以限制从Hive传输到MySQL中用于报告目的的数据量。我想单个HiveQL查询给我这个:
1 1
1 4
2 1
2 5
3 1
3 2
不幸的是mysql没有解析函数。所以你必须玩变数。 假设你有一个自动增量字段:
mysql> create table mytab (
-> id int not null auto_increment primary key,
-> first_column int,
-> second_column int
->) engine = myisam;
Query OK, 0 rows affected (0.05 sec)
mysql> insert into mytab (first_column,second_column)
-> values
-> (1,1),(1,4),(2,10),(3,4),(1,4),(2,5),(1,6);
Query OK, 7 rows affected (0.00 sec)
Records: 7 Duplicates: 0 Warnings: 0
mysql> select * from mytab order by id;
+----+--------------+---------------+
| id | first_column | second_column |
+----+--------------+---------------+
| 1 | 1 | 1 |
| 2 | 1 | 4 |
| 3 | 2 | 10 |
| 4 | 3 | 4 |
| 5 | 1 | 4 |
| 6 | 2 | 5 |
| 7 | 1 | 6 |
+----+--------------+---------------+
7 rows in set (0.00 sec)
mysql> select
-> id,
-> first_column,
-> second_column,
-> row_num
-> from (
-> select *,
-> @num := if(@first_column = first_column, @num:= @num + 1, 1) as row_num,
-> @first_column:=first_column as c
-> from mytab order by first_column,id) as t,(select @first_column:='',@num:
=0) as r;
+----+--------------+---------------+---------+
| id | first_column | second_column | row_num |
+----+--------------+---------------+---------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 4 | 2 |
| 5 | 1 | 4 | 3 |
| 7 | 1 | 6 | 4 |
| 3 | 2 | 10 | 1 |
| 6 | 2 | 5 | 2 |
| 4 | 3 | 4 | 1 |
+----+--------------+---------------+---------+
7 rows in set (0.00 sec)
mysql> select
-> id,
-> first_column,
-> second_column,
-> row_num
-> from (
-> select *,
-> @num := if(@first_column = first_column, @num:= @num + 1, 1) as row_num,
-> @first_column:=first_column as c
-> from mytab order by first_column,id) as t,(select @first_column:='',@num:
=0) as r
-> having row_num<=2;
+----+--------------+---------------+---------+
| id | first_column | second_column | row_num |
+----+--------------+---------------+---------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 4 | 2 |
| 3 | 2 | 10 | 1 |
| 6 | 2 | 5 | 2 |
| 4 | 3 | 4 | 1 |
+----+--------------+---------------+---------+
5 rows in set (0.02 sec)
1)没有'ORDER BY'子句不起作用。 2)'ORDER BY'列必须是您计算的列。否则这是行不通的。 – Green
蜂房的解决办法是
SELECT S.col1, S.col2
FROM
(SELECT col1, col2, row_number() over (partition by col1) as r FROM mytable) S
WHERE S.r < 3
通过有序什么? – Matthew
这些表和columsn有没有名字? –
尝试通过['最大每个组'+'mysql']搜索此网站(http://stackoverflow.com/questions/tagged/greatest-n-per-group+mysql?sort=votes&pagesize=50 )标签组合,并查看您是否可以找到适合您情况的解决方案。 –