更快的方式列表转换为data.frame一些列的值丢失

我有列表名单更快的方式列表转换为data.frame一些列的值丢失

> head(train) 
[[1]] 
[[1]]$Physics 
[1] 8 

[[1]]$Chemistry 
[1] 7 

[[1]]$PhysicalEducation 
[1] 3 

[[1]]$English 
[1] 4 

[[1]]$Mathematics 
[1] 6 

[[1]]$serial 
[1] 195490 

. 
. 
[[6]] 
[[6]]$Physics 
[1] 2 

[[6]]$Chemistry 
[1] 1 

[[6]]$Biology 
[1] 2 

[[6]]$English 
[1] 4 

[[6]]$Mathematics 
[1] 8 

[[6]]$serial 
[1] 182318

每个子列表中有任何五行出的这12个和一个额外的命名serial

columns <- c("Physics", "Chemistry", "PhysicalEducation", "English", 
      "Mathematics", "serial", "ComputerScience", "Hindi", "Biology", 
      "Economics", "Accountancy", "BusinessStudies")

我想哟将此列表转换为数据框。

目前，我正在通过迭代一行来使用循环。虽然这可行，但需要花费大量的时间。

colclass <- rep("numeric",12) 
comby <- read.table(text = '', colClasses = colclass, col.names = columns) 
for(i in 1:length(train)){ 
    comby[i,names(train[[i]])] <- train[[i]] 
}

我尝试使用do.call(rbind, train)但因为它使从第一次迭代中增加新的数据到列老不起作用。

什么是更好，更快的方式？我有大约150万观察值。
期望的o/p：数据帧应该包含所有的列。我想在没有价值的地方使用NA。另外我感兴趣的是如果可以更快地完成而不使用任何额外的软件包。

Physics Chemistry PhysicalEducation English Mathematics serial ComputerScience Hindi Biology Economics Accountancy 
1  8   7     3  4   6 195490    NA NA  NA  NA   NA 
2  1   1     1  3   3 190869    NA NA  NA  NA   NA 
3  1   2     2  1   2 3111    NA NA  NA  NA   NA 
4  8   7     6  7   7 47738    NA NA  NA  NA   NA 
5  1   1     1  3   2 85520    NA NA  NA  NA   NA 
6  2   1    NA  4   8 182318    NA NA  2  NA   NA 
    BusinessStudies 
1    NA 
2    NA 
3    NA 
4    NA 
5    NA 
6    NA

这里是可再现的代码

train <- [{\"Physics\":8,\"Chemistry\":7,\"PhysicalEducation\":3,\"English\":4,\"Mathematics\":6,\"serial\":195490},{\"Physics\":1,\"Chemistry\":1,\"PhysicalEducation\":1,\"English\":3,\"Mathematics\":3,\"serial\":190869},{\"Physics\":1,\"Chemistry\":2,\"PhysicalEducation\":2,\"English\":1,\"Mathematics\":2,\"serial\":3111},{\"Physics\":8,\"Chemistry\":7,\"PhysicalEducation\":6,\"English\":7,\"Mathematics\":7,\"serial\":47738},{\"Physics\":1,\"Chemistry\":1,\"PhysicalEducation\":1,\"English\":3,\"Mathematics\":2,\"serial\":85520},{\"Physics\":2,\"Chemistry\":1,\"Biology\":2,\"English\":4,\"Mathematics\":8,\"serial\":182318},{\"Physics\":3,\"Chemistry\":4,\"PhysicalEducation\":5,\"English\":5,\"Mathematics\":8,\"serial\":77482},{\"Accountancy\":2,\"BusinessStudies\":5,\"Economics\":3,\"English\":6,\"Mathematics\":7,\"serial\":152940},{\"Physics\":5,\"Chemistry\":6,\"Biology\":7,\"English\":3,\"Mathematics\":8,\"serial\":132620}] 
train <- rjson::fromJSON(train)

来源

2017-02-01 Ankit

请你添加代码重现样本，为了不从它开始的，而不是写一个新的例子，？ – OmaymaS

尝试'do.call（plyr :: rbind.fill，lapply（train，data.frame））'或'dplyr :: bind_rows（lapply（train，data.frame））'。 – Abdou

plz检查我编辑的答案（sol.1）与呜呜声：地图 – OmaymaS

正如你可以purrr使用起点::地图如下：

的采样数据集：

x <- list(list(physics=8, 
       Chemistry=7, 
       PhysicalEducation=3, 
       English=4, 
       serial=195490), 
      list(physics=2, 
       Chemistry=1, 
       Biology=2, 
       English=4, 
       Mathematics=8, 
       serial=182318))

Sol.1[最短来避免环路]

zzz <- sapply(columns, function(n) map_dbl(x,n,.null=NA)) %>% 
     data.frame()

其中给出：

> zzz 
    Physics Chemistry PhysicalEducation English Mathematics serial ComputerScience Hindi Biology Economics 
1  NA   7     3  4   NA 195490    NA NA  NA  NA 
2  NA   1    NA  4   8 182318    NA NA  2  NA 
    Accountancy BusinessStudies 
1   NA    NA 
2   NA    NA

如果您想了解这是如何工作的，可以查看下面较长的解决方案。

Sol。2[手动分配]

-pick为每列中的值：其中给出

z <- data.frame(
    serial = map_dbl(x,"serial",.null=NA), 
    Biology = map_dbl(x,"Biology",.null=NA), 
    Chemistry = map_dbl(x,"Chemistry",.null=NA) 
     )

：

> z 
    serial Biology Chemistry 
1 195490  NA   7 
2 182318  2   1 
>

Sol.3[预定义的数据帧和-loop]

从列表中创建一个数据帧有一个固定大小的

zz <- data.frame(matrix(NA, nrow = length(x), ncol = 12))
指定名称

names(zz) <- columns
分配值

for(i in 1:ncol(zz)){ zz[columns[i]] <- map_dbl(x,columns[i],.null=NA) }

其中给出：

> zz 
    Physics Chemistry PhysicalEducation English Mathematics serial ComputerScience Hindi Biology Economics 
1  NA   7     3  4   NA 195490    NA NA  NA  NA 
2  NA   1    NA  4   8 182318    NA NA  2  NA 
    Accountancy BusinessStudies 
1   NA    NA 
2   NA    NA

来源

2017-02-01 15:34:19 OmaymaS

可以在基础R完成此通过组合Reduce，和Map。

数据

这里是你的结构相匹配的数据集。

set.seed(1234) 
temp <- replicate(7, setNames(replicate(7, sample(1:10, 1), simplify=FALSE), letters[1:7]), 
        simplify=FALSE)

要从此产生data.frame，你可以使用

Reduce(rbind, Map(data.frame, temp)) 
    a b c d e f g 
1 2 7 7 7 9 7 1 
2 3 7 6 7 6 3 10 
3 3 9 3 3 2 3 4 
4 4 2 1 3 9 6 10 
5 9 1 5 3 4 6 2 
6 8 3 3 10 9 6 7 
7 4 7 4 6 7 5 3

凡data.frame结构data.frames与内部元素。 Map将此应用于外部列表的每个元素，从而生成data.frames列表。最后，Reducerbind是列表中的data.frames，并生成一个data.frame。

来源

2017-02-01 15:08:34 lmo

请参阅编辑和所需的输出格式 – Ankit

更快的方式列表转换为data.frame一些列的值丢失

回答

相关问题