你好,我有以下data.frame(追加)。我想添加一个标准化计数的额外列N = N/sum(N)
。我有没有日期列前一个data.frame,并能够做到这一点使用正常化数据R
oo[, N.norm := N/sum(N), by=Operator]
我试图通过功能
oo[, N.norm := N/sum(N), by=Operator,Date]
到日期添加到,但收到一条错误消息
Error in `[.data.frame`(oo, , `:=`(N.norm, N/sum(N)), by = Operator, Date) :
unused argument(s) (by = Operator)
例如,对于运营商“A”在月“2013年1月”,我有每个计数N
数量= c(“好”,“好”,“差”,“废话”)。我想总结n该组合(A和2013年1月)和sum(N)
划分数N
在另一方面,任何人都可以给我提供一个体面的介绍操纵data.frames R中
structure(list(Operator = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("A",
"D", "J", "L", "M"), class = "factor"), ROI_Score = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L), .Label = c("Crap", "Good", "OK", "Poor"), class = "factor"),
Date = c("Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013"), N = c(0, 0, 0, 0, 0, 1, 2, 15, 1, 5, 3, 2, 3,
1, 0, 3, 0, 5, 5, 1, 0, 0, 0, 1, 0, 14, 17, 16, 8, 7, 5,
10, 6, 1, 5, 24, 27, 31, 16, 15, 0, 0, 0, 0, 0, 26, 24, 20,
11, 18, 3, 4, 17, 3, 2, 20, 36, 12, 21, 9, 0, 0, 0, 0, 0,
3, 12, 5, 12, 4, 0, 0, 3, 4, 0, 29, 37, 41, 25, 10, 0, 0,
0, 0, 0, 9, 9, 15, 17, 3, 6, 4, 5, 4, 1, 14, 13, 9, 15, 9
)), .Names = c("Operator", "ROI_Score", "Date", "N"), row.names = c(NA,
100L), class = "data.frame")
我不确定数据是以data.frame还是data.table格式。这里是我的代码,改编自阿伦(reshape/remould data frame to create normalized bar chart and pie chart)给出解决办法
df <- data.frame(read.csv("/misc/jaguar_data/report/system/db_fs/roi_scores.csv"))
#Get date into nice structure for faceting
df$Date = strftime(strptime(df$Date,f="%d/%m/%Y"), "%b %Y")
dt <- data.table(df)
ops <- as.character(unique(dt$Operator))
scr <- as.character(unique(dt$ROI_Score))
dts <- unique(dt$Date)
oo <- setkey(dt[, .N, by="Operator,ROI_Score,Date"], Operator,
ROI_Score,Date)[CJ(ops, scr,dts)][is.na(N), N:= 0L]
oo[, N.norm := N/sum(N), by=Operator]
这个附加列:第i行的N.norm应该是N [i]/sum(N [1 ... i),但是由操作员和日期汇总?你真的是指'data.table'而不是'data.frame'吗? ':='运算符仅限于'data.table'。请澄清您正在使用的结构:您给了我们一个数据框。 –
@BryanHanson - 我不确定。我已经更新了我的问题,以解释我如何使用数据结构oo。它最初是一个data.frame,但我认为它现在是一个data.table – moadeep
你绝对使用'data.table',看你自己的代码,这使得清楚(你开始一个'data.frame',但它转向它到'data.table')。通常在数据集非常大且速度非常关键时使用这些数据。否则,'data.frame'通常很好。你试图计算什么? –