2011-10-19 32 views
4

我有一个大的数据帧,即时通讯与工作中提取重复行,前几行如下:从数据帧

 Assay Genotype Sample Result 
1  001  G   1   0 
2  001  A   2   1 
3  001  G   3   0 
4  001  NA  1   NA 
5  002  T   1   0 
6  002  G   2   1 
7  002  T   2   0 
8  002  T   4   0 
9  003  NA  1   NA 

我总共将有2000个样品和168个测定为合作每个样品。

我喜欢用相同的Assay和Sample来提取我有多个条目的行。我希望生成的数据位于包含所有重复条目的数据框中,按照重复条件彼此相邻排序。从结果上面的例子是这样的:

 Assay Genotype Sample Result 
1  001  G   1   0 
4  001  NA  1   NA 
6  002  G   2   1 
7  002  T   2   0 

回答

5

演示数据,便于装载:

df <- structure(list(Assay = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L), Genotype = structure(c(2L, 1L, 2L, NA, 3L, 2L, 3L, 3L, NA), .Label = c("A", "G", "T"), class = "factor"), Sample = c(1L, 2L, 3L, 1L, 1L, 2L, 2L, 4L, 1L), Result = c(0L, 1L, 0L, NA, 0L, 1L, 0L, 0L, NA)), .Names = c("Assay", "Genotype", "Sample", "Result"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9")) 

你可以很容易地duplicated得到dupicated分析/样品对:

vars <- c('Assay', 'Sample') 
dup <- df[duplicated(x[, vars]), vars] 

产生于:

> dup 
    Assay Sample 
4  1  1 
7  2  2 

需要简单merge所需结果:

> merge(dup, df) 
    Assay Sample Genotype Result 
1  1  1  <NA>  NA 
2  1  1  G  0 
3  2  2  G  1 
4  2  2  T  0