2013-01-24 47 views
1

我想在sas中使用proc sql来确定案例或记录​​是否缺少某些信息。我有两个数据集。一个是整个数据收集的记录,显示访问期间收集了哪些表单。第二种是在访问期间收集什么样的的规范。我已经尝试了许多方案,包括数据的步骤和使用not in无济于事SQL代码...如何在sas中使用proc sql找到丢失的案例?

示例数据低于


***** dataset crf is a listing of all forms that have been filled out at each visit ; 
***** cid is an identifier for a study center ; 
***** pid is an identifier for a participant ; 

data crf; 
    input visit cid pid form ; 
cards; 
1 10 101 10 
1 10 101 11 
1 10 101 12 
1 10 102 10 
1 10 102 11 
2 10 101 11 
2 10 101 13 
2 10 102 11 
2 10 102 12 
2 10 102 13 
; 
run; 


***** dataset crfrule is a listing of all forms that should be filled out at each visit ; 
***** so, visit 1 needs to have forms 10, 11, and 12 filled out ; 
***** likewise, visit 2 needs to have forms 11 - 14 filled out ; 

data crfrule; 
    input visit form ; 
cards; 
1 10 
1 11 
1 12 
2 11 
2 12 
2 13 
2 14 
; 
run; 


***** We can see from the two tables that participant 101 has a complete set of records for visit 1 ; 
***** However, participant 102 is missing form 12 for visit 1 ; 
***** For visit 2, 101 is missing forms 12 and 14, whereas 102 is missing form 14 ; 


***** I want to be able to know which forms were **NOT** filled out by each person at each visit (i.e., which forms are missing for each visit) ; 


***** extracting unique cases from crf ; 
proc sql; 
    create table visit_rec as 
    select distinct cid, pid, visit 
     from crf; 
quit; 



***** building the list of expected forms by visit number ; 
proc sql; 
    create table expected as 
    select x.*, 
      y.* 

    from visit_rec as x right join crfrule as y 
     on x.visit = y.visit 

    order by visit, cid, pid, form; 
quit; 


***** so now I have a list of which forms that **SHOULD** have been filled out by each person ; 

***** now, I just need to know if they were filled out or not... ; 

我一直在努力,是要合并expected战略回到crf表中,其中有一些指标表明每次访问时缺少哪些表单。

理想情况下,我想产生将有一个表:参观,CID,PID,missing_form

任何指导,是极大的赞赏。

+0

我已经试过[这个答案](http://stackoverflow.com/questions/8946593/how-can-i-use-proc-sql-to-find-all-the-records的许多版本 - 只存在于一个表 - 但)在我迄今的尝试。 –

+0

这些都是很好的答案! –

回答

0

您可以使用左连接并使用where子句过滤掉右表中缺少记录的记录。

select 
    e.* 
from 
    expected e left join 
    crf c on 
    e.visit = c.visit and 
    e.cid = c.cid and 
    e.pid = c.pid and 
    e.form = c.form 
where c.visit is missing 
; 
2

EXCEPT将做你想做的。我不一定知道这是一般最有效的解决方案(如果您在SAS中执行此操作,几乎肯定不会),但考虑到您迄今为止所做的工作,它确实有效:

create table want as 
    select cid,pid,visit,form from expected 
    except select cid,pid,visit,form from crf 
; 

只要小心,除非 - 它非常挑剔(请注意,select *不起作用,因为您的表格有不同的顺序)。

2

我建议一个嵌套的查询,或者可以分两步完成。这个怎么样:

proc sql; 
    create table temp as 
    select distinct c.* 
     , (d.visit is null and d.form is null and d.pid is null) as missing_form 
    from (
     select distinct a.pid, b.* from 
     crf a, crfrule b 
    ) c 
    left join crf d 
    on  c.pid = d.pid 
     and c.form = d.form 
     and c.visit = d.visit 
    order by c.pid, c.visit, c.form 
    ; 
quit; 

它为您提供了PID,形式的所有可能的(即预期)组合的列表,请访问和布尔值,指示是否存在与否。

+0

+1做了轻微的编辑,为'ORDER BY'列添加别名,但除此之外非常好的答案! – BellevueBob