2015-04-24 60 views
1

我想从基于两列的表中返回一组唯一记录以及最近的发布时间和这些组合的总次数在他们的输出记录之前(及时)出现了两列。为SQL中的每个唯一组合列计算行

所以我想要得到的是沿着这些路线的东西:

select col1, col2, max_posted, count from T 
join (
select col1, col2, max(posted) as posted from T where groupid = "XXX" 
group by col1, col2) h 
on (T.col1 = h.col1 and 
    T.col2 = h.col2 and 
    T.max_posted = h.tposted) 
where T.groupid = 'XXX' 

计数需要是次发生col1和col2上的每个组合前max_posted输出每个记录的数量。 (我希望我解释说,正确:)

编辑:在尝试下面的建议为:

select dx.*, 
    count(*) over (partition by dx.cicd9, dx.cdesc order by dx.tposted) as cnt 
from dx 
join (
select cicd9, cdesc, max(tposted) as tposted from dx where groupid ="XXX" 
group by cicd9, cdesc) h 
on (dx.cicd9 = h.cicd9 and 
    dx.cdesc = h.cdesc and 
    dx.tposted = h.tposted) 
where groupid = 'XXX'; 

伯爵始终返回“1”。此外,您如何计算tposted之前发生的记录?

这也失败了,但我希望你能得到在那里我当家:

WITH H AS (
    SELECT cicd9, cdesc, max(tposted) as tposted from dx where groupid = 'XXX' 
    group by cicd9, cdesc), 
    J AS (
    SELECT count(*) as cnt 
    FROM dx, h 
    WHERE dx.cicd9 = h.cicd9 
     and dx.cdesc = h.cdesc 
     and dx.tposted <= h.tposted 
     and dx.groupid = 'XXX' 
) 
SELECT H.*,J.cnt 
FROM H,J 

帮助的人?

+2

样本数据和期望的结果将有助于澄清问题。 –

回答

1

如何:

SELECT DISTINCT ON (cicd9, cdesc) cicd9, cdesc, 
    max(posted) OVER w AS last_post, 
    count(*) OVER w AS num_posts 
FROM dx 
WHERE groupid = 'XXX' 
WINDOW w AS (
    PARTITION BY cicd9, cdesc 
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 
); 

由于缺乏PG版本,表定义,数据和所需的输出,这是刚刚从臀部射击,但原则应该工作:在制作分区在groupid = 'XXX'的两列中找到posted列的最大值和窗口框架(因此在窗口定义中的RANGE...子句)中的行总数。

+0

很多在这里学到的东西,如果这可以计算所有的行,它可能会工作..我需要研究它,当我没有脑死亡。谢谢。 (PG版本9.3) –

+0

如何将查询更改为每cicd9,cdesc组只产生一行? (它目前在dx中找到的每一行重复相同的输出行)。谢谢(我所有的新概念)。 –

+0

@AlanWayne:添加了一个'DISTINCT ON'子句,查看更新的答案。 – Patrick

0

你只是想要一个累计数?

select t.*, 
     count(*) over (partition by col1, col2 order by posted) as cnt 
from table t 
where groupid = 'xxx'; 
+0

是的。发布之前只发生记录的累计计数。请看我上面的编辑。谢谢。 –

+0

我可能会添加,我正在查找与输出行匹配的行数(包括重复项)的总数。 –

+0

你的问题仍然含糊不清。您应该添加示例数据和期望的结果。 –

0

这是我能想到的最好的 - 更好的建议,欢迎!

这将产生我需要数将永远是至少1(从加入)的结果,但有一项谅解:

SELECT dx.cicd9, dx.cdesc, max(dx.tposted), count(*) 
from dx 
join (
SELECT cicd9, cdesc, max(tposted) as tposted from dx where groupid = 'XXX' 
    group by cicd9, cdesc) h 
on 
    (dx.cicd9 = h.cicd9 and dx.cdesc = h.cdesc and dx.tposted <= h.tposted 
    and dx.groupid = 'XXX') 
group by dx.cicd9, dx.cdesc 
order by dx.cdesc; 

WITH H AS (
    SELECT cicd9, cdesc, max(tposted) as tposted from dx where groupid = 'XXX' 
    group by cicd9, cdesc) 
SELECT dx.cicd9, dx.cdesc, max(dx.tposted), count(*) 
from dx, H 
where dx.cicd9 = h.cicd9 and dx.cdesc = h.cdesc and dx.tposted <= h.tposted 
    and dx.groupid = 'XXX' 
group by dx.cicd9, dx.cdesc 
order by cdesc; 
+0

它是否会返回您需要或不?如果不是,它是如何失败的?如果您无法用单词澄清任务,您应该在问题中添加有意义的示例值和所需的结果*。 –

+0

@ErwinBrandstetter是(差不多)。如果count = 0,会更好,因为我希望先前日期的行数有所增加。否则,它运作良好。 –

0

这是令人困惑:

计数需要的次数每个组合的col1和 col2发生在max_posted之前输出中的每条记录。

由于根据定义,记录是“之前”(或在同一时间作为)最新帖子,这基本上意味着每组合的总计数(忽略假定断接一个错误在句子中)。

所以这个烧毁的简单GROUP BY

SELECT cicd9, cdesc 
    , max(posted) AS last_posted 
    , count(*) AS ct 
FROM dx 
WHERE groupid = 'XXX' 
GROUP BY 1, 2 
ORDER BY 1, 2; 

这不正是一样,目前接受的答案。只是更快,更简单。