2014-02-06 69 views
4

我刚刚发现了这个bug,却发现有人叫它"feature"。这使rbindlist不像do.call("rbind",l) 将尊重列名称。此外,在文档中没有提到这种完全意外的行为。这真的是故意的吗?为什么rbindlist不尊重列名?

代码例如:

> library(data.table) 
> DT1 <- data.table(a=1, b=2) 
> DT2 <- data.table(b=3, a=4) 
> DT1 
a b 
1: 1 2 
> DT2 
b a 
1: 3 4 

我期望rbind“荷兰国际集团这些会产生具有= 1,4的列; b = 2,3。并得到rbind.data.tablerbind.data.frame,虽然rbind.data.table产生警告。

> rbind(DT1, DT2) 
a b 
1: 1 2 
2: 4 3 
Warning message: 
In data.table::.rbind.data.table(...) : 
Argument 2 has names in a different order. Columns will be bound by name for consistency with base. You can drop names (by using an unnamed list) and the columns will then be joined by position, or set use.names=FALSE. Alternatively, explicitly setting use.names to TRUE will remove this warning. 
> rbind(as.data.frame(DT1), as.data.frame(DT2)) 
a b 
1 1 2 
2 4 3 
> do.call('rbind', list(DT1, DT2)) 
a b 
1: 1 2 
2: 4 3 
Warning message: 
In data.table::.rbind.data.table(...) : 
Argument 2 has names in a different order. Columns will be bound by name for consistency with base. You can drop names (by using an unnamed list) and the columns will then be joined by position, or set use.names=FALSE. Alternatively, explicitly setting use.names to TRUE will remove this warning. 

rbindlist,但是,很高兴地默默破坏数据:

> rbindlist(list(DT1, DT2)) 
a b 
1: 1 2 
2: 3 4 
+1

看一看这个[出色答卷(http://stackoverflow.com/a/15673654/1627235)。 –

+2

'rbindlist'针对速度进行了优化。匹配列名称会适得其反,我希望默认行为不会改变。但是,可以免费提交功能请求。 – Roland

+0

斯文,我链接到我的文章。这对我来说似乎并不特别权威。罗兰,如果你正在破坏数据,速度毫无用处。默默地在那。此外,如果名称不被尊重,那么使用具有命名列的数据结构有什么意义? – James

回答

5

该功能在commit 1266 of v1.9.3现已实现。从NEWS

o 'rbindlist' gains 'use.names' and 'fill' arguments and is now implemented 
    entirely in C. Closes #5249  
    -> use.names by default is FALSE for backwards compatibility (doesn't bind by 
    names by default) 
    -> rbind(...) now just calls rbindlist() internally, except that 'use.names' 
    is TRUE by default, for compatibility with base (and backwards compatibility). 
    -> fill by default is FALSE. If fill is TRUE, use.names has to be TRUE. 
    -> At least one item of the input list has to have non-null column names. 
    -> Duplicate columns are bound in the order of occurrence, like base. 
    -> Attributes that might exist in individual items would be lost in the bound result. 
    -> Columns are coerced to the highest SEXPTYPE, if they are different, if/when possible. 
    -> And incredibly fast ;). 
    -> Documentation updated in much detail. Closes DR #5158. 

有了这个,你可以设置use.names=TRUE通过名称绑定。为了向后兼容,默认设置为FALSE。或者,您可以使用rbind(..),其中use.names=TRUE也是为了向后兼容。

有关更多示例,请参见this post,对于基准,请参阅this post

实例:

1)只需设置use.names=TRUE

DT1 <- data.table(x=1, y=2) 
DT2 <- data.table(y=1, x=2) 

rbindlist(list(DT1,DT2), use.names=TRUE, fill=FALSE) 
# x y 
# 1: 1 2 
# 2: 2 1 

DT1 <- data.table(x=1, y=2) 
DT2 <- data.table(z=2, y=1) 

# returns error when fill=FALSE but can't be bound without fill=TRUE 
rbindlist(list(DT1, DT2), use.names=TRUE, fill=FALSE) 
# Error in rbindlist(list(DT1, DT2), use.names = TRUE, fill = FALSE) : 
    # Answer requires 3 columns whereas one or more item(s) in the input 
    # list has only 2 columns. ... 

2)也结合重复的列名中出现的顺序:

DT1 <- data.table(x=1, x=2, y=10, y=20, y=30) 
DT2 <- data.table(y=-10, x=-2, y=-20, x=-1, y=-30) 

rbindlist(list(DT1,DT2), use.names=TRUE) 

#  x x y y y 
# 1: 1 2 10 20 30 
# 2: -2 -1 -10 -20 -30 

3)使用fill=TRUE,如果你想通过名称绑定,并填写缺少的列

DT1 <- data.table(x=1, y=2) 
DT2 <- data.table(y=2, z=-1) 

rbindlist(list(DT1, DT2), fill=TRUE) 
#  x y z 
# 1: 1 2 NA 
# 2: NA 2 -1 

HTH

相关问题