2015-11-16 133 views
0

我重新提出一个问题,我试图简化我的数据集并给出我想要的输出示例。如果这仍然复杂,请随时发表评论,可能会帮助我澄清这一点。R - 过早退出循环

我有一个表,我已经分组的功能,具有类似的rt和mz。

  orig_feat mz_mid rt_mid similar_feature 
1   f_1 685.4350 466.5    f_1 
2   f_2 260.1655 245.0    f_2 
185   f_2 260.1665 256.5   f_185 
408   f_2 260.1670 239.0   f_408 
2334  f_2 260.1650 250.0   f_2334 
3   f_3 288.1980 276.0    f_3 
7   f_3 288.1990 289.0    f_7 
414   f_3 288.1970 275.0   f_414 
2181  f_3 288.1980 270.0   f_2181 
2969  f_3 288.1965 297.5   f_2969 
4   f_4 537.3915 454.5    f_4 
2271  f_4 537.3965 435.5   f_2271 
5   f_5 439.2990 153.5    f_5 
6   f_6 325.0690 210.5    f_6 
10   f_6 325.0685 227.0   f_10 
747   f_6 325.0685 184.5   f_747 
2068  f_6 325.0695 225.0   f_2068 
2929  f_6 325.0685 218.0   f_2929 
2970  f_6 325.0680 237.0   f_2970 
31   f_7 288.1980 276.0    f_3 
71   f_7 288.1990 289.0    f_7 
4141  f_7 288.1970 275.0   f_414 
21811  f_7 288.1980 270.0   f_2181 
29691  f_7 288.1965 297.5   f_2969 

我想列出每个组的条目。所有具有相同$ orig_feat的行都应该进行“分组”,对于这些“分组”中的每一个,我都需要一个包含所有功能的向量。请参阅下面的示例输出。

$grf_1 
[1] "f_1" 

$grf_2 
[1] "f_2" "f_185" "f_408" "f_2334" 

$grf_3 
[1] "f_3" "f_7" "f_414" "f_2181" "f_2969" 

$grf_4 
[1] "f_4" "f_2771" 

$grf_5 
[1] "f_5" 

$grf_6 
[1] "f_6" "f_10" "f_747" "f_2068" "f_2929" "f_2970" 

但重要的是我希望这是非冗余(如gf_3:包含F_7,f_414,f_2181,f_2696,所以当我到达F_7我不会让一个组F_7作为F_3组媒体链接包含f_7组中的所有功能)

下面是我的代码,因为它代表。目前,产生的输出在grf_3之后停止。 我不知道为什么它似乎过早地退出循环。

mkFeatGroupsList<-function(simFeatsTab){ 
    features_seen<-vector() 
    GroupingList<-list() 
    counter=1 
    for (i in 1:length(unique(simFeatsTab$orig_feat))){ 
    orig_feat2Grp<-simFeatsTab$orig_feat[i] 
    if (orig_feat2Grp%in%features_seen == TRUE) next 
    matchingFeats<-subset(simFeatsTab,orig_feat==orig_feat2Grp)$feature 
    grFeatNm<-paste("grf_",counter,sep="") 
    GroupingList[[grFeatNm]]<-matchingFeats 
    features_seen<-c(features_seen,matchingFeats) 
    counter=counter+1 
    } 
    return(GroupingList) 
} 

因为您需要测试数据。

> dput(simFeatsTab.10.30.test) 
structure(list(orig_feat = structure(c(1L, 2L, 2L, 2L, 2L, 3L, 
3L, 3L, 3L, 3L, 4L, 4L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 
7L, 7L), .Label = c("f_1", "f_2", "f_3", "f_4", "f_5", "f_6", 
"f_7"), class = "factor"), mz_mid = c(685.435, 260.1655, 260.1665, 
260.167, 260.165, 288.198, 288.199, 288.197, 288.198, 288.1965, 
537.3915, 537.3965, 439.299, 325.069, 325.0685, 325.0685, 325.0695, 
325.0685, 325.068, 288.198, 288.199, 288.197, 288.198, 288.1965 
), rt_mid = c(466.5, 245, 256.5, 239, 250, 276, 289, 275, 270, 
297.5, 454.5, 435.5, 153.5, 210.5, 227, 184.5, 225, 218, 237, 
276, 289, 275, 270, 297.5), similar_feature = c("f_1", "f_2", 
"f_185", "f_408", "f_2334", "f_3", "f_7", "f_414", "f_2181", 
"f_2969", "f_4", "f_2271", "f_5", "f_6", "f_10", "f_747", "f_2068", 
"f_2929", "f_2970", "f_3", "f_7", "f_414", "f_2181", "f_2969" 
)), .Names = c("orig_feat", "mz_mid", "rt_mid", "similar_feature" 
), class = "data.frame", row.names = c("1", "2", "185", "408", 
"2334", "3", "7", "414", "2181", "2969", "4", "2271", "5", "6", 
"10", "747", "2068", "2929", "2970", "31", "71", "4141", "21811", 
"29691")) 

回答

1

我继续这样:

  • 分割你的数据帧由orig_feat(我把它叫做feat

  • 使用sapply

  • 环通,以获得相关功能相关功能并消除重复项目

换算成:

feat.split <- split(feat, my.df$orig_feat) 

sim.feat <- sapply(feat.split, function(x){x$similar_feature}) 

for (i in 2:length(sim.feat)) 
    { 
    # Get all of the previous features 
    prev.feat <- do.call("c", sim.feat[1:(i-1)]) 

    # Remove features already used 
    sim.feat[[i]] <- sim.feat[[i]][!sim.feat[[i]] %in% prev.feat] 
    } 
+0

谢谢,这是伟大的。现在删除以前的功能后,有一些空的元素。我试图删除它们,但它不起作用。其他建议? sim.feat <-lapply(sim.feat,function(f)f [length(f)> 0]) – user2814482

+0

@ user2814482:尝试'sim.feat [sapply(sim.feat,length)> 0] – nico

3

另一种解决方案可以使用igraph包:

require(igraph) 
x<-graph.data.frame(df[,c(1,4)]) 
#You can also take a look with plot(x) 
res<-clusters(x) 
split(names(res$membership),res$membership) 
#$`1` 
#[1] "f_1" 
#$`2` 
#[1] "f_2" "f_185" "f_408" "f_2334" 
#$`3` 
#[1] "f_3" "f_7" "f_414" "f_2181" "f_2969" 
#$`4` 
#[1] "f_4" "f_2271" 
#$`5` 
#[1] "f_5" 
#$`6` 
#[1] "f_6" "f_10" "f_747" "f_2068" "f_2929" "f_2970"