让我们先得到一些样本数据。我使用批准的类别创建了132条记录,以获得具有66行的50%样本。
create table task as
select 'approved' category, rownum task_id from dual connect by level <= 132 union all
select 'denied' category, rownum task_id from dual connect by level <= 134 union all
select 'canceled' category, rownum task_id from dual connect by level <= 130
;
的关键步骤是定义方含为每个类别0和1 之间的值。如果你想要的样品的柱RAND_PERC
与值小于或等于一个产品类别说50%选择的所有行。 5
通过按随机顺序分配行编号(每个类别都独立)并且除以每个类别中的行数 ,首先计算该列。
select CATEGORY, TASK_ID,
(row_number() over (partition by task.category order by dbms_random.value))/
(count(*) over (partition by task.category)) as rand_perc
from task
order by 1,3;
CATEGORY TASK_ID RAND_PERC
-------- ---------- ----------
approved 56 ,00757575758
approved 129 ,0151515152
approved 61 ,0227272727
要绘制样本,请根据需要简单定义WHERE条件 - 请参阅下面的示例。
with rnd as (
select CATEGORY, TASK_ID,
(row_number() over (partition by task.category order by dbms_random.value))/
(count(*) over (partition by task.category)) as rand_perc
from task
)
select CATEGORY, count(*) cnt
from rnd
where
category = 'approved' and rand_perc <= .5 or /* take 50% from active */
category = 'denied' and rand_perc <= .3 or
category = 'canceled' and rand_perc <= .2
group by CATEGORY
;
这给样本大小根据需要
CATEGORY CNT
-------- ----------
canceled 26
denied 40
approved 66
@Marmite轰炸机 –