2017-02-17 11 views
3

我的问题在自然语言中有明确说明,但我无法在(ORACLE)SQL中找到解决方案。删除重量超过限制的行,直到没有这样的行

数据具有VALUE(正数)和LIMIT(0到1之间的值表示百分比)列。其任务是删除(或识别)VALUE大于其余行的VALUES总和的LIMIT%的行。另一个公式:删除重量(由VALUE定义)大于总量的LIMIT的行。

请注意,删除某些行后,总和会减少,因此另一行可能会失败并需要删除。

到目前为止,我尝试了解析函数(请参阅下面的示例),递归与连接。一切都没有成功。

该解决方案应该在没有PL/SQL的SQL中。如果共识是没有这样的解决方案存在,那么程序就没问题。

我的错误的解决方法使用分析功能如下:

WITH 
DATA as (
    SELECT 23 as KEY, 100 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 22 as KEY, 101 as VALUE, 0.05 as LIMIT from DUAL 
    UNION ALL 
    SELECT 21 as KEY, 10 as VALUE, 0.05 as LIMIT from DUAL 
    UNION ALL 
    SELECT 20 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 19 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 18 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 17 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 16 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 15 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 14 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 13 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 12 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 11 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 10 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 9 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 8 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 7 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 6 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 5 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 4 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 3 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 2 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
    UNION ALL 
    SELECT 1 as KEY, 1 as VALUE, 0.15 as LIMIT from DUAL 
), 
REMOVED AS (
    SELECT 
    d.*, 
    CASE 
     WHEN d.VALUE >= d.LIMIT * (SUM(d.VALUE) OVER()) THEN 'N' 
     ELSE 'Y' 
    END as FLAG, 
    d.VALUE/d.LIMIT * (SUM(d.VALUE) OVER()) AS ACT_WEIGHT, 
    SUM(d.VALUE) OVER() AS TOTOAL_SUM 
    FROM DATA d 
) 
SELECT r.KEY 
FROM REMOVED r 
WHERE FLAG='N'; 
-- Wrong: KEY=21 is missing! 
+0

嗨,这个过程应该是什么顺序?你需要指定。也尝试使用滞后和领先的窗口函数,那些可以扫描剩余的行。 – PeterRing

回答

2

此使用递归子查询保条款(又名CTE)反复产生两套行。第一组消除了高于先前总数百分比限制的值并重新计算新总数,第二组生成的行更新again列,以确定在迭代生成之前是否有任何高于新总数的新行一组新的行。

查询

WITH cte (key, value, limit, lvl, total, again) AS (
    SELECT key, value, limit, 1, SUM(value) OVER(), 1 
    FROM data 
UNION ALL 
    SELECT key, 
     value, 
     limit, 
     lvl + 1, 
     CASE MOD(lvl, 2) 
      WHEN 1 
      THEN SUM(value) OVER() 
      ELSE total 
     END, 
     CASE MOD(lvl, 2) 
      WHEN 0 
      THEN MAX(CASE WHEN value > limit * total THEN 1 ELSE 0 END) OVER() 
      ELSE 1 
     END 
    FROM cte 
    WHERE (MOD(lvl, 2) = 0 OR value <= limit * total) 
    AND again = 1 
) 
SELECT * 
FROM cte 
WHERE again = 0; 

输出

 KEY  VALUE  LIMIT  LVL  TOTAL  AGAIN 
---------- ---------- ---------- ---------- ---------- ---------- 
     20   1  .15   5   20   0 
     19   1  .15   5   20   0 
     18   1  .15   5   20   0 
     17   1  .15   5   20   0 
     16   1  .15   5   20   0 
     15   1  .15   5   20   0 
     14   1  .15   5   20   0 
     13   1  .15   5   20   0 
     12   1  .15   5   20   0 
     11   1  .15   5   20   0 
     10   1  .15   5   20   0 
     9   1  .15   5   20   0 
     8   1  .15   5   20   0 
     7   1  .15   5   20   0 
     6   1  .15   5   20   0 
     5   1  .15   5   20   0 
     4   1  .15   5   20   0 
     3   1  .15   5   20   0 
     2   1  .15   5   20   0 
     1   1  .15   5   20   0 
+0

非常好。几个月前,我想出了一个解决不同问题的类似解决方案,并且自从我多次使用该技术以来;我在OTN上写了这篇文章。我不知道它是否有名称 - 我称之为缺乏更好的术语的“跷跷板递归查询”技术。如果这是一个众所周知的技术,我会感兴趣;在OTN上似乎并不熟悉。有几个链接:https://community.oracle.com/thread/3966882和https://community.oracle.com/thread/3974508 – mathguy

+1

有了这个说法,现在我明白了数学问题(没有必要一个“考虑行的顺序”,因为如果一行最初违反了条件,无论如何它必须被丢弃,因为它会违反它可能成为的一部分行的任何子集),我发现这个特殊问题不需要跷跷板技术。虽然我仍然对这项技术感兴趣! – mathguy

+0

感谢MT0和mathguy。你的解决方案使用一个很好的技巧 –

1

这可以用一个递归CTE来解决,如MT0显示。 MT0的解决方案可以简化 - 这个问题不需要我在MT0评论中提到的“跷跷板递归查询”。

在递归查询中,我从给定数据开始,但添加了三列:值的总和(对于“当前行”),行数和数字1作为占位符。 (任何正数都可以在那里使用。)

然后在递归步骤中,我保持上一级满足限制条件的行(但对于“旧”总和,包括必须丢弃的行)。我再次计算总和(val),以用于下一步,我计算剩下多少行,并计算“this”级别与前一级别之间的计数差异。

如果在某个时候计数差异变为零,那意味着在那个级别上我不需要丢弃任何行。那些是解决问题的答案。我在外部查询中选择它们。

with 
    data (key, value, limit) as (
     select ........... 
    ), 
    r (key, value, limit, tot, cnt, diff) as (
     select key, value, limit, sum(value) over(), count(*) over(), 1 
     from data 
     union all 
     select key, value, limit, sum(value) over(), count(*) over(), 
       cnt - count(*) over() 
     from r 
     where value <= limit * tot and diff > 0 
) 
select key, value, limit 
from r 
where diff = 0 
;