2016-04-25 12 views
1

这是参照这里所描述的早期问题:Oracle SQL: How to get Random Records by each group的Oracle SQL:如何使用预定义的贡献获得随机记录各组

问:

是,能够以比获得随机样本不同的类别。例如:如果我有132个有3个类别(批准,拒绝,取消)的样本的随机记录,我如何按照以下比例获取样本?

total sample = 132 

category  samples % sample Size 
approved  50%  66 
denied  30%  40 
canceled  20%  26 

注:我需要的原始数据,不计

+0

@Marmite轰炸机 –

回答

0

让我们先得到一些样本数据。我使用批准的类别创建了132条记录,以获得具有66行的50%样本。

create table task as 
select 'approved' category, rownum task_id from dual connect by level <= 132 union all 
select 'denied' category, rownum task_id from dual connect by level <= 134 union all 
select 'canceled' category, rownum task_id from dual connect by level <= 130 
; 

的关键步骤是定义方含为每个类别0和1 之间的值。如果你想要的样品的柱RAND_PERC与值小于或等于一个产品类别说50%选择的所有行。 5

通过按随机顺序分配行编号(每个类别都独立)并且除以每个类别中的行数 ,首先计算该列。

select CATEGORY, TASK_ID, 
(row_number() over (partition by task.category order by dbms_random.value))/
(count(*) over (partition by task.category)) as rand_perc 
from task 
order by 1,3; 

CATEGORY TASK_ID RAND_PERC 
-------- ---------- ---------- 
approved   56 ,00757575758 
approved  129 ,0151515152 
approved   61 ,0227272727 

要绘制样本,请根据需要简单定义WHERE条件 - 请参阅下面的示例。

with rnd as (
select CATEGORY, TASK_ID, 
(row_number() over (partition by task.category order by dbms_random.value))/
(count(*) over (partition by task.category)) as rand_perc 
from task 
) 
select CATEGORY, count(*) cnt 
from rnd 
where 
category = 'approved' and rand_perc <= .5 or /* take 50% from active */ 
category = 'denied' and rand_perc <= .3 or 
category = 'canceled' and rand_perc <= .2 
group by CATEGORY 
; 

这给样本大小根据需要

CATEGORY  CNT 
-------- ---------- 
canceled   26 
denied   40 
approved   66 
+0

真棒。因为我们都知道你是最好的@Marmite轰炸机。非常感谢您的快速响应。 –

+0

@Badri如果它解决了问题,你也可以接受答案:) –