2013-01-04 51 views
35

我会通过机器学习的黑客,和我被困在这条线:意义ddply错误的:“名字”属性[9]必须是相同的长度矢量[1]

from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject)) 

产生以下错误:

Error in attributes(out) <- attributes(col) : 
    'names' attribute [9] must be the same length as the vector [1] 

这是一个回溯():

> traceback() 
11: FUN(1:5[[1L]], ...) 
10: lapply(seq_len(n), extract_col_rows, df = x, i = i) 
9: extract_rows(x$data, x$index[[i]]) 
8: `[[.indexed_df`(pieces, i) 
7: pieces[[i]] 
6: function (i) 
    { 
     piece <- pieces[[i]] 
     if (.inform) { 
      res <- try(.fun(piece, ...)) 
      if (inherits(res, "try-error")) { 
       piece <- paste(capture.output(print(piece)), collapse = "\n") 
       stop("with piece ", i, ": \n", piece, call. = FALSE) 
      } 
     } 
     else { 
      res <- .fun(piece, ...) 
     } 
     progress$step() 
     res 
    }(1L) 
5: .Call("loop_apply", as.integer(n), f, env) 
4: loop_apply(n, do.ply) 
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress, 
     .inform = .inform, .parallel = .parallel, .paropts = .paropts) 
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress, 
     .inform = .inform, .parallel = .parallel, .paropts = .paropts) 
1: ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject)) 

的priority.train对象是一个数据帧,并且在这里是更多信息:

> mode(priority.train) 
[1] "list" 
> names(priority.train) 
[1] "Date"  "From.EMail" "Subject" "Message" "Path"  
> sapply(priority.train, mode) 
     Date From.EMail  Subject  Message  Path 
    "list" "character" "character" "character" "character" 
> sapply(priority.train, class) 
$Date 
[1] "POSIXlt" "POSIXt" 

$From.EMail 
[1] "character" 

$Subject 
[1] "character" 

$Message 
[1] "character" 

$Path 
[1] "character" 

> length(priority.train) 
[1] 5 
> nrow(priority.train) 
[1] 1250 
> ncol(priority.train) 
[1] 5 
> str(priority.train) 
'data.frame': 1250 obs. of 5 variables: 
$ Date  : POSIXlt, format: "2002-01-31 22:44:14" "2002-02-01 00:53:41" "2002-02-01 02:01:44" "2002-02-01 10:29:23" ... 
$ From.EMail: chr "[email protected]" "[email protected]" "[email protected]" "[email protected]" ... 
$ Subject : chr "please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" ... 
$ Message : chr " \n Hello,\n \n   I just installed redhat 7.2 and I think I have everything \nworking properly. Anyway I want to in"| __truncated__ "Make sure you rebuild as root and you're in the directory that you\ndownloaded the file. Also it might complain of a few depen"| __truncated__ "Lance wrote:\n\n>Make sure you rebuild as root and you're in the directory that you\n>downloaded the file. Also it might compl"| __truncated__ "Once upon a time, rob wrote :\n\n> I dl'd gcc3 and libgcc3, but I still get the same error message when I \n> try rpm --rebuil"| __truncated__ ... 
$ Path  : chr "../03-Classification/data/easy_ham/01061.6610124afa2a5844d41951439d1c1068" "../03-Classification/data/easy_ham/01062.ef7955b391f9b161f3f2106c8cda5edb" "../03-Classification/data/easy_ham/01063.ad3449bd2890a29828ac3978ca8c02ab" "../03-Classification/data/easy_ham/01064.9f4fc60b4e27bba3561e322c82d5f7ff" ... 
Warning messages: 
1: In encodeString(object, quote = "\"", na.encode = FALSE) : 
    it is not known that wchar_t is Unicode on this platform 
2: In encodeString(object, quote = "\"", na.encode = FALSE) : 
    it is not known that wchar_t is Unicode on this platform 

我会发布一个示例,但内容有点长,我不认为这里的内容是相关的。

同样的错误也发生在这里:

> ddply(priority.train, .(Subject)) 
Error in attributes(out) <- attributes(col) : 
    'names' attribute [9] must be the same length as the vector [1] 

是否有人在这里发生了什么的线索?该错误似乎是由不同于priority.train的对象生成的,因为它的names属性显然有9个元素。

我很感激任何帮助。谢谢!

问题解决

我已经找到了问题的感谢@ user1317221_G的使用dput功能的提示。问题在于日期字段,该字段在此处是包含9个字段(秒,分钟,小时,星期一,星期一,星期一,星期六,星期一,星期几)的列表。为了解决这个问题,我简单地转换日期为特征向量,ddply使用,那么转换的历史可以追溯到日期:

> tmp <- priority.train$Date 
> priority.train$Date <- as.character(priority.train$Date) 
> from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject)) 
> priority.train$Date <- tmp 
> rm(tmp) 
+4

在您的附加信息的地方,我可以建议'STR(priority.train )'? –

+0

@ sebastian-c当然!我现在编辑这个问题。 – Motasim

+0

“这个错误在R中意味着什么?”可能是你可以使用的最无用的问题标题。请下次再考虑一下。 – flodel

回答

41

我固定这个问题我被从POSIXlt转换格式POSIXct如哈德利具有上述提示 - 一个代码行:

mydata$datetime<-strptime(mydata$datetime, "%Y-%m-%d %H:%M:%S") # original conversion from datetime string : > class(mydata$datetime) [1] "POSIXlt" "POSIXt" 
    mydata$datetime<-as.POSIXct(mydata$datetime) # convert to POSIXct to use in data frames/ddply 
+8

单行'mydata $ datetime <-as.POSIXct(mydata $ datetime)#转换为POSIXct以便在数据框中使用/ ddply'将我从地狱圈中拯救出来。干杯 – lamecicle

6

您可能已经seen this,它并没有帮助。我想我们可能还没有答案,因为人们无法重现你的错误。

A dput或更小head(dput())可能会有所帮助。但这里有一个替代使用base

x <- data.frame(A=c("a","b","c","a"),B=c("e","d","d","d")) 

ddply(x,.(A),summarise, Freq = length(B)) 
    A Freq 
1 a 2 
2 b 1 
3 c 1 

tapply(x$B,x$A,length) 
a b c 
2 1 1 

为你做这项工作tapply

x2 <- data.frame(A=c("[email protected]", "[email protected]"), 
       B=c("please help a newbie compile mplayer :-)", 
        "re: please help a newbie compile mplayer :-)")) 

tapply(x2$B,x2$A,length) 
[email protected] [email protected] 
       1     1 

ddply(x2,.(A),summarise, Freq = length(B)) 
        A Freq 
1 [email protected] 1 
2 [email protected] 1 

,你也可以尝试更简单地说:

table(x2$A) 

[email protected] [email protected] 
       1     1 
+1

你写的例子工作正常。除了前两个,我已经从DF中删除了所有行,并将所有值都设置为NA。然后我运行你提到的dput函数,并且感到惊讶!日期字段是一个包含9个字段(秒,分钟,小时,星期一,星期一,年,星期六,星期六,星期日)的列表。将日期字段转换为char向量解决了问题。谢谢!! – Motasim

4

我有一个非常类似的问题,但不知道这是否是相同的一个。我收到下面的错误。

Error in attributes(out) <- attributes(col) : 
    'names' attribute [20388] must be the same length as the vector [128] 

我没有任何列表模式的变量,所以莫塔的解决方案不适用于我的情况。我对问题进行排序的方式是删除plyr 1.8并手动安装plyr 1.7。错误然后消失了。我也试着重新安装plyr 1.8并复制问题。

HTH。

+1

我也看到了同样的错误,这是用Yishin的方法修复的。 – lokheart

3

我面临着类似的问题与ddply以及与已给出的代码/错误如下:

test <- ddply(test, "catColumn", function(df) df[1:min(nrow(df), 3),]) 
    Error: 'names' attribute [11] must be the same length as the vector [2] 

数据框“测试”中有不少分类变量。

转换分类变量以字符变量如下作出ddply指挥工作:

test <- data.frame(lapply(test, as.character), stringsAsFactors=FALSE) 
1

我使用ddply时有同样的问题,并固定其与doBy

library(doBy) 
bylength = function(x){length(x)} 
newdt = bylength(X ~From.EMail + To.EMail, data = dt, FUN = bylength) 
2

一旦你明白它是干扰你的一个日期列也可以简单地离开那列当你运行命令,而不是转换它...

所以

from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject)) 

可以成为

from.weight <- ddply(priority.train[,c(1:7,9:10)], .(From.EMail), summarise, Freq = length(Subject)) 

如果例如POSIXlt日期正好是在数据帧的8列中。报告错误的奇怪之处在于,它可能与您正在尝试按照或正在寻找的输出信息无关......

0

我也遇到同样的问题,我通过解决只有通过保持所需要的数据ddply和转换滤镜变量和所有需要的文本变量字符as.character

它的工作

相关问题