2014-03-24 43 views
0

我使用data.table当运行到一个错误相同数量的每个组的问题列。该错误信息是:“J”不计算为在data.table

[.data.table(x.out, , if (all(V3 > 25)) c(as.character(V1[1])错误,: j不计算在相同的列数为每组

我想知道如何解决这个错误。

我的数据看起来像这样(请参阅本岗位的底部使用dput数据的重复性形式):

c007d.1 1  2 
c007d.1 2  2 
c007d.1 3  2 
c007d.1 4  31 
c007d.1 5  55 
c007d.1 6  60 
c007d.1 7  13 

当我运行的代码如下所示:

library(data.table) 
x.out$grp <- rep(1:ceiling(nrow(x.out)/3),each=3) 
output <- x.out[, if(all(V3 > 25)) c(as.character(V1[1]), 
        V2[1], V2[3], as.list(V3)), by = grp] 

输出如下所示:

 grp V1 V2 V3 V4 V5 V6 
1: 2 d3.1 4 6 31 55 60 

该代码检查列3有3行的值大于25,并且如果是这样,则打印第2列的第一行,以及最后一行。

这适用于此处显示的小数据。但是当我在一个有16,000行的文件上运行它时,我遇到了上面提到的错误。有没有解决的办法?我不是特别束缚于使用data.table,但我知道它比其他选项更快。


这里是dput(x.out)输出:

> head(dput(x.out)) 

structure(list(V1 = c("c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1", "c007d.1", 
"c007d.1", "c007d.1", "c007d.1", "c007d.1"), V2 = 1:287, 
V3 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 3, 24, 33, 43, 47, 48, 48, 48, 50, 53, 63, 70, 78, 82, 
82, 82, 82, 82, 82, 84, 84, 84, 87, 88, 88, 93, 103, 138, 
158, 175, 186, 222, 319, 398, 487, 540, 554, 574, 581, 584, 
587, 588, 587, 559, 557, 557, 557, 556, 556, 556, 556, 556, 
556, 554, 554, 546, 542, 530, 478, 462, 454, 437, 412, 374, 
246, 244, 211, 54, 49, 1, 1, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 
4, 5, 6, 6, 6, 7, 7, 8, 10, 12, 21, 68, 147, 533, 588, 600, 
601, 620, 646, 666, 694, 709, 725, 729, 737, 743, 750, 784, 
805, 829, 849, 907, 929, 957, 982, 984), grp = c(1L, 1L, 
1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 
6L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 
11L, 11L, 12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L, 14L, 15L, 
15L, 15L, 16L, 16L, 16L, 17L, 17L, 17L, 18L, 18L, 18L, 19L, 
19L, 19L, 20L, 20L, 20L, 21L, 21L, 21L, 22L, 22L, 22L, 23L, 
23L, 23L, 24L, 24L, 24L, 25L, 25L, 25L, 26L, 26L, 26L, 27L, 
27L, 27L, 28L, 28L, 28L, 29L, 29L, 29L, 30L, 30L, 30L, 31L, 
31L, 31L, 32L, 32L, 32L, 33L, 33L, 33L, 34L, 34L, 34L, 35L, 
35L, 35L, 36L, 36L, 36L, 37L, 37L, 37L, 38L, 38L, 38L, 39L, 
39L, 39L, 40L, 40L, 40L, 41L, 41L, 41L, 42L, 42L, 42L, 43L, 
43L, 43L, 44L, 44L, 44L, 45L, 45L, 45L, 46L, 46L, 46L, 47L, 
47L, 47L, 48L, 48L, 48L, 49L, 49L, 49L, 50L, 50L, 50L, 51L, 
51L, 51L, 52L, 52L, 52L, 53L, 53L, 53L, 54L, 54L, 54L, 55L, 
55L, 55L, 56L, 56L, 56L, 57L, 57L, 57L, 58L, 58L, 58L, 59L, 
59L, 59L, 60L, 60L, 60L, 61L, 61L, 61L, 62L, 62L, 62L, 63L, 
63L, 63L, 64L, 64L, 64L, 65L, 65L, 65L, 66L, 66L, 66L, 67L, 
67L, 67L, 68L, 68L, 68L, 69L, 69L, 69L, 70L, 70L, 70L, 71L, 
71L, 71L, 72L, 72L, 72L, 73L, 73L, 73L, 74L, 74L, 74L, 75L, 
75L, 75L, 76L, 76L, 76L, 77L, 77L, 77L, 78L, 78L, 78L, 79L, 
79L, 79L, 80L, 80L, 80L, 81L, 81L, 81L, 82L, 82L, 82L, 83L, 
83L, 83L, 84L, 84L, 84L, 85L, 85L, 85L, 86L, 86L, 86L, 87L, 
87L, 87L, 88L, 88L, 88L, 89L, 89L, 89L, 90L, 90L, 90L, 91L, 
91L, 91L, 92L, 92L, 92L, 93L, 93L, 93L, 94L, 94L, 94L, 95L, 
95L, 95L, 96L, 96L)), .Names = c("V1", "V2", "V3", "grp"), row.names = c(NA, 
-287L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x7fdb7b006d78>) 
+1

这是不可复制的。目前还不清楚你打算将'output'看作/是 – mnel

+0

我编辑了这篇文章,使我的例子更清晰 – user3141121

+0

请提供'dput(x.out)'的输出,因为这有助于人们提供帮助。 – hrbrmstr

回答

2

错误消息真的是相当不言自明的。

一个可重复的例子是你应该提供的。这是一个(在你的榜样第7行替换V3)

x.out <- data.table(structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), 
.Label = "c007d.1", class = "factor"), 
    V2 = 1:7, V3 = c(2L, 2L, 2L, 31L, 55L, 60L, 26L)), 
.Names = c("V1", "V2", "V3"), 
class = c("data.frame"), row.names = c(NA, -7L))) 


# add your grouping column (data.table style) 
x.out[, grp := rep(seq_len(ceiling(.N/3)), each = 3,length.out=.N)] 

您的问题是,GRP = 2你在V3 3行,在GRP = 3,你有1,所以当你使用as.list(连同c),您可以创建不同长度的列表。

你想如何填写grp = 3的缺失列?

编辑:

为了您的可重复的例子,所有grp■找行除了GRP 96 ..

+0

以上的输出(x.out)输出,缺失的列可以仅为0,因为我只关心V3中超过25个的事情。然而,我有很多文件(1000年),所以可以推广到在所有文件中填写缺失的列的文件? – user3141121

+0

当我添加你建议的分组命令时,我不再一次按3行分组。根据你提供的例子,我只得到1 grp。 – user3141121

+0

@ user3141121 - 的确如此。看我的编辑。 – mnel

相关问题