1
我从最近的中期版本(我认为1.8.X)更新到data.table
- 1.9.4的最新版本,现在我得到了一些意外行为。data.table中的新行为? .N /用'by'(计算比例)
set.seed(12312014)
# a vector of letters a:e, each repeated between 1 and 10 times
type <- unlist(mapply(rep, letters[1:5], round(runif(5, 1, 10), 0)))
# a random vector of 3 categories
category <- sample(c('small', 'med', 'large'), length(type), replace=T)
my_dt <- data.table(type, category)
说我想按类型分类的比例。我曾经这样做,通过这样做:
my_dt[, type_n:=.N, by=type]
my_dt[, .N/type_n, by=.(type, category)][order(type, category)]
什么我得到data.table 1.9.4:
# type category V1
# 1: a large 0.2500000
# 2: a large 0.2500000
# 3: a med 0.2500000
# 4: a med 0.2500000
# 5: a small 0.5000000
# 6: a small 0.5000000
# 7: a small 0.5000000
# 8: a small 0.5000000
# 9: b large 0.4285714
# 10: b large 0.4285714
# 11: b large 0.4285714
# 12: b med 0.4285714
# (...and so on, 42 rows long)
但我用得到,我几乎可以肯定,这是(按类型猫的简单比例):
# type category V1
# 1: a large 0.2500000
# 2: a med 0.2500000
# 3: a small 0.5000000
# 4: b large 0.4285714
# 5: b med 0.4285714
# 6: b small 0.1428571
# 7: c large 0.3000000
# 8: c med 0.1000000
# 9: c small 0.6000000
# 10: d large 0.2222222
# 11: d med 0.6666667
# 12: d small 0.1111111
# 13: e large 0.3750000
# 14: e med 0.3750000
# 15: e small 0.2500000
我能得到这个期望的结果:
unique(my_dt[, .N/type_n, by=.(type, category)][order(type, category)])
...但我想知道在新的data.table语法中是否有首选方法。我知道我也可以使用prop.table
,但我想要它的长格式。
prop.table(table(my_dt), margin=1)
# category
# type large med small
# a 0.2500000 0.2500000 0.5000000
# b 0.4285714 0.4285714 0.1428571
# c 0.3000000 0.1000000 0.6000000
# d 0.2222222 0.6666667 0.1111111
# e 0.3750000 0.3750000 0.2500000
仅供参考,我的电话sessionInfo给出:
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_1.0.0 data.table_1.9.4
loaded via a namespace (and not attached):
[1] chron_2.3-45 colorspace_1.2-4 digest_0.6.4 grid_3.1.1 gtable_0.1.2 labeling_0.2
[7] MASS_7.3-33 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 Rcpp_0.11.2 reshape2_1.4
[13] scales_0.2.4 stringr_0.6.2 tools_3.1.1
所以这些成果的一个你真正想要的? –
不是你的问题的答案,但如果你对'prop.table'满意,只想要一个长格式,你也可以'data.table(prop.table(table(my_dt),margin = 1))' 。 – A5C1D2H2I1M1N2O1R2T1
或'my_dt [,prop.table(table(category)),by = type]' –