2017-07-17 48 views
10

我想通过组变量拆分嵌套列表。请考虑以下结构:拆分和操作嵌套列表

> str(L1) 
List of 2 
$ names:List of 2 
    ..$ first: chr [1:5] "john" "lisa" "anna" "mike" ... 
    ..$ last : chr [1:5] "johnsson" "larsson" "johnsson" "catell" ... 
$ stats:List of 2 
    ..$ physical:List of 2 
    .. ..$ age : num [1:5] 14 22 53 23 31 
    .. ..$ height: num [1:5] 165 176 179 182 191 
    ..$ mental :List of 1 
    .. ..$ iq: num [1:5] 102 104 99 87 121 

现在我需要产生两个列表,其中同时使用L1$names$last拼接,导致L2L3,如下所示:

L2:结果由L1$names$last

分组
> str(L2) 
List of 3 
$ johnsson:List of 2 
    ..$ names:List of 1 
    .. ..$ first: chr [1:2] "john" "anna" 
    ..$ stats:List of 2 
    .. ..$ physical:List of 2 
    .. .. ..$ age : num [1:2] 14 53 
    .. .. ..$ height: num [1:2] 165 179 
    .. ..$ mental :List of 1 
    .. .. ..$ iq: num [1:2] 102 99 
$ larsson :List of 2 
    ..$ names:List of 1 
    .. ..$ first: chr [1:2] "lisa" "steven" 
    ..$ stats:List of 2 
    .. ..$ physical:List of 2 
    .. .. ..$ age : num [1:2] 22 31 
    .. .. ..$ height: num [1:2] 176 191 
    .. ..$ mental :List of 1 
    .. .. ..$ iq: num [1:2] 104 121 
$ catell :List of 2 
    ..$ names:List of 1 
    .. ..$ first: chr "mike" 
    ..$ stats:List of 2 
    .. ..$ physical:List of 2 
    .. .. ..$ age : num 23 
    .. .. ..$ height: num 182 
    .. ..$ mental :List of 1 
    .. .. ..$ iq: num 87 

L3:每组只允许发生一次L1$names$last

List of 2 
$ 1:List of 2 
    ..$ names:List of 2 
    .. ..$ first: chr [1:3] "john" "lisa" "mike" 
    .. ..$ last : chr [1:3] "johnsson" "larsson" "catell" 
    ..$ stats:List of 2 
    .. ..$ physical:List of 2 
    .. .. ..$ age : num [1:3] 14 22 23 
    .. .. ..$ height: num [1:3] 165 176 182 
    .. ..$ mental :List of 1 
    .. .. ..$ iq: num [1:3] 102 104 87 
$ 2:List of 2 
    ..$ names:List of 2 
    .. ..$ first: chr [1:2] "anna" "steven" 
    .. ..$ last : chr [1:2] "johnsson" "larsson" 
    ..$ stats:List of 2 
    .. ..$ physical:List of 2 
    .. .. ..$ age : num [1:2] 53 31 
    .. .. ..$ height: num [1:2] 179 191 
    .. ..$ mental :List of 1 
    .. .. ..$ iq: num [1:2] 99 121 

I`ve试图申请this solution,但现在看来,这不会对嵌套列表的工作。

重复性代码:

L1 <- list("names" = list("first" = c("john","lisa","anna","mike","steven"),"last" = c("johnsson","larsson","johnsson","catell","larsson")),"stats" = list("physical" = list("age" = c(14,22,53,23,31), "height" = c(165,176,179,182,191)), "mental" = list("iq" = c(102,104,99,87,121)))) 

L2 <- list("johnsson" = list("names" = list("first" = c("john","anna")),"stats" = list("physical" = list("age" = c(14,53), "height" = c(165,179)), "mental" = list("iq" = c(102,99)))), "larsson" = list("names" = list("first" = c("lisa","steven")),"stats" = list("physical" = list("age" = c(22,31), "height" = c(176,191)), "mental" = list("iq" = c(104,121)))), "catell" = list("names" = list("first" = "mike"),"stats" = list("physical" = list("age" = 23, "height" = 182), "mental" = list("iq" = 87)))) 

L3 <- list("1" = list("names" = list("first" = c("john","lisa","mike"),"last" = c("johnsson","larsson","catell")),"stats" = list("physical" = list("age" = c(14,22,23), "height" = c(165,176,182)), "mental" = list("iq" = c(102,104,87)))), "2" = list("names" = list("first" = c("anna","steven"),"last" = c("johnsson","larsson")),"stats" = list("physical" = list("age" = c(53,31), "height" = c(179,191)), "mental" = list("iq" = c(99,121))))) 

编辑:请注意,实际的数据集是相当大的,更深入地嵌套比提供的示例。

+0

您的数据似乎是非常结构化的,即矩形,为什么你不使用数据帧 – rawr

+0

我没有考虑到当我创建样本数据时。我正在使用的实际数据动态变化,并不一定是矩形。 –

+0

你能提供一个非列表向量不都具有相同长度的例子吗?随着理想的最终结果? –

回答

6

通常用于修改列表,您将要使用递归。例如,考虑这样的功能:

foo <- function(x, idx) { 

    if (is.list(x)) { 
     return(lapply(x, foo, idx = idx)) 
    } 
    return(x[idx]) 
} 

它需要一些列表作为x和一些指数idx的。它将检查x是否是一个列表,如果是这种情况,它将自动提供给列表的所有子元素。一旦x不再是一个列表,我们采取由idx给出的元素。在整个过程中,原始列表的结构将保持不变。

这里有一个完整的例子。请注意,此代码假定列表中的所有矢量都有5个元素。

L1 <- list("names" = list("first" = c("john","lisa","anna","mike","steven"),"last" = c("johnsson","larsson","johnsson","catell","larsson")),"stats" = list("physical" = list("age" = c(14,22,53,23,31), "height" = c(165,176,179,182,191)), "mental" = list("iq" = c(102,104,99,87,121)))) 

L2 <- list("johnsson" = list("names" = list("first" = c("john","anna")),"stats" = list("physical" = list("age" = c(14,53), "height" = c(165,179)), "mental" = list("iq" = c(102,99)))), "larsson" = list("names" = list("first" = c("lisa","steven")),"stats" = list("physical" = list("age" = c(22,31), "height" = c(176,191)), "mental" = list("iq" = c(104,121)))), "catell" = list("names" = list("first" = "mike"),"stats" = list("physical" = list("age" = 23, "height" = 182), "mental" = list("iq" = 87)))) 

L3 <- list("1" = list("names" = list("first" = c("john","lisa","mike"),"last" = c("johnsson","larsson","catell")),"stats" = list("physical" = list("age" = c(14,22,23), "height" = c(165,176,182)), "mental" = list("iq" = c(102,104,87)))), "2" = list("names" = list("first" = c("anna","steven"),"last" = c("johnsson","larsson")),"stats" = list("physical" = list("age" = c(53,31), "height" = c(179,191)), "mental" = list("iq" = c(99,121))))) 

# make L2 
foo <- function(x, idx) { 

    if (is.list(x)) { 
     return(lapply(x, foo, idx = idx)) 
    } 
    return(x[idx]) 
} 

levels <- unique(L1$names$last) 
L2_2 <- vector("list", length(levels)) 
names(L2_2) <- levels 
for (i in seq_along(L2_2)) { 

    idx <- L1$names$last == names(L2_2[i]) 
    L2_2[[i]] <- list(names = foo(L1$names[-2], idx), 
         stats = foo(L1$stats, idx)) 

} 
identical(L2, L2_2) 

str(L2) 
str(L2_2) 

# make L3 

dups <- duplicated(L1$names$last) 
L3_2 <- vector("list", 2) 
names(L3_2) <- 1:2 
for (i in 1:2) { 

    if (i == 1) 
     idx <- !dups 
    else 
     idx <- dups 

    L3_2[[i]] <- foo(L1, idx) 

} 
identical(L3, L3_2) 
str(L3) 
str(L3_2) 
+0

非常感谢你,你的解决方案可以在小列表中正常工作,但对于我的数据集(约~50个变量约有920个观测值),这是不可行的。 –

+0

为什么不可行?时间?记忆?错误? – CPak

1

这不是一个完整的答案,但我希望它有帮助。

看看这个工程的L3:

x = data.frame(L1, stringsAsFactors = F) 
y = x[order(x$names.last),] 
y$seq = 1 
y$seq = ifelse(y$names.last == shift(y$names.last),shift(y$seq)+1,1) 
y$seq[1] = 1 

z = list(list(names=list(first=z[[1]]$names.first, last=z[[1]]$names.last), stats=list(physical = list(age =z[[1]]$stats.physical.age, height= z[[1]]$stats.physical.height), mental=list(iq= z[[1]]$stats.iq))), list(names=list(first=z[[2]]$names.first, last=z[[2]]$names.last), stats=list(physical = list(age =z[[2]]$stats.physical.age, height= z[[2]]$stats.physical.height), mental=list(iq= z[[2]]$stats.iq)))) 

最后一部分(z)该转换回列表可以用循环来完成。假设同名不会太多,循环不会太慢。

你说它更嵌套,在这种情况下,您需要添加is.null和或tryCatch函数来处理错误。