2015-12-09 51 views
0

的问题很难在标题来概括一组的线,所以这里更详细的例子:SQL查询来获取每个项目具有最大occurence

我有几十个测量的一个巨大的数据集为数千个不同的对象。他们大多数都有关联的类型,但这种类型并不明确。

因此,一个选择像

SELECT oid, type, count(type) FROM data GROUP BY oid, type; 

会产生类似:

oid type count(type) 
0  0 22 
1  0 22 
2  1 61 
2  2 104 
3  2 63 
4  0 34 
6  0 1 
8  2 76 
9  0 1 
11  3 33 
12  0 55 
13  4 1 
13  5 28 
13  1 2 
13  2 255 
14  4 148 
14  1 4 
14  2 3 
15  3 10 
16  0 13 
18  4 137 
18  1 5 

我怎样才能得到每个对象只有一行的结果,如果这只是行必须是一个与大多数情况下?

Bonus-Question:还会获得每个对象行的百分比,表示此类型的出现率。

结果应该是这样的:

oid type P(type) 
0  0 1.0 
1  0 1.0 
2  2 0.64 
3  2 1.0 
4  0 1.0 
6  0 1.0 
8  2 1.0 
9  0 1.0 
11  3 1.0 
12  0 1.0 
13  2 0.89 
14  4 0.95 
15  3 1.0 
16  0 1.0 
18  4 0.96 

编辑:

一些测试数据,一个解决方案的几乎正确的输出:

http://pastebin.com/jVvHErJ2

+2

您正在使用哪个数据库管理系统? – pedram

+0

请不要标记所有产品,仅标记那些需要的标记... –

+0

嗯 - 我对某些用户没有任何线索了解我的问题的意图有点不确定,但随机编辑一些与我无关的标签实际问题。 - >再次删除标签。这些编辑的批准也让我怀疑 –

回答

1

此查询解决这两个你问题

SELECT s.oid, 
     s.type, 
     s.total_per_oid_per_type, 
     (s.total_per_oid_per_type + 0.0)/s.total_per_oid AS percentage 
FROM (SELECT v.oid, 
      v.type, 
      v.total_per_oid_per_type, 
      ROW_NUMBER() OVER (PARTITION BY v.oid ORDER BY v.total_per_oid_per_type DESC) AS object_number, 
      SUM(v.total_per_oid_per_type) OVER (PARTITION BY v.oid) AS total_per_oid 
     FROM (SELECT t.oid, t.type, count(1) AS total_per_oid_per_type 
      FROM data t 
      GROUP BY t.oid, t.type) v) s 
WHERE object_number = 1 

解特供SQLITE3(等于以上)

WITH v AS (
    SELECT oid, 
      type, 
      COUNT(1) AS total_per_oid_per_type 
    FROM data 
    GROUP BY oid, type 
), 
s AS (
    SELECT oid, 
      MAX(total_per_oid_per_type) AS max_total_per_oid 
    FROM v 
    GROUP BY oid 
), 
totals AS (
    SELECT oid, 
      SUM(total_per_oid_per_type) AS total_per_oid 
    FROM v 
    GROUP BY oid 
) 
SELECT v.oid, 
     v.type, 
     v.total_per_oid_per_type, 
     (v.total_per_oid_per_type + 0.0)/totals.total_per_oid AS percentage 
FROM v 
    INNER JOIN s ON v.oid = s.oid AND v.total_per_oid_per_type = s.max_total_per_oid 
    INNER JOIN totals ON v.oid = totals.oid 
ORDER BY v.oid, v.type 
+0

OP现在已经阐明,这是在sqlite3中实现的(它不支持窗口化的分析函数,例如row_number)。 –

+1

这不是个好消息......但谢谢! –

+0

我已经添加了Sqlite3解决方案 –

0

试试这个它应该工作

create table ##TBL (oid INT, [type] INT, [count(type)] INT) 
INSERT INTO ##TBL VALUES 
(0,0,22), 
(1,0,22), 
(2,1,61), 
(2,2,104), 
(3,2,63), 
(4,0,34), 
(6,0,1), 
(8,2,76), 
(9,0,1), 
(11,3,33), 
(12,0,55), 
(13,4,1), 
(13,5,28), 
(13,1,2), 
(13,2,255), 
(14,4,148), 
(14,1,4), 
(14,2,3), 
(15,3,10), 
(16,0,13), 
(18,4,137), 
(18,1,5) 
-------------------------------- 

SELECT oid 
     ,max([type]) as x 
     --,Max([count(type)]) AS [count(type)] 
     ,CAST(CAST(MAX([count(type)]) AS DECIMAL(10,2))/CAST(SUM([count(type)]) AS DECIMAL(10,2)) AS DECIMAL(10,2)) AS 'Percent %' 
from ##TBL 
group by oid 
+0

似乎并没有选择最常发生的类型,而是最高类型的类型 –