2016-05-12 48 views
0

我有一个数据帧ModelDF具有像数字列以及字符值:R:骨料字符串

Quantity  Type  Mode  Company 
    1   Shoe  hello  Nike 
    1   Shoe  hello  Nike 
    2   Jeans  hello  Levis 
    3   Shoe  hello  Nike 
    1   Jeans  hello  Levis 
    1   Shoe  hello  Adidas 
    2   Jeans  hello  Spykar 
    1   Shoe  ahola  Nike 
    1   Jeans  ahola  Levis 

我有聚集它以这种形式

Quantity  Type  Mode  Company 
    5   Shoe  hello  Nike 
    3   jeans  hello  Levis 
    1   Shoe  hello  adidas 
    2   jeans  hello  Spykar 
    1   Shoe  ahola  Nike 
    1   jeans  ahola  Levis 

即我有如果所有其他列相同,则汇总和汇总数量。

我尝试过使用aggregate,但由于它对字符值不起作用,所以它给了我错误的结果。

我有什么选择? 感谢

回答

0
aggregate(Quantity ~ Type + Mode + Company, df, sum) 
# Type Mode Company Quantity 
#1 Shoe hello Adidas  1 
#2 Jeans ahola Levis  1 
#3 Jeans hello Levis  3 
#4 Shoe ahola Nike  1 
#5 Shoe hello Nike  5 
#6 Jeans hello Spykar  2 

你也可以尝试data.table选项:

setDT(df)[, .(Sum.Quantity = sum(Quantity)), by = list(Type, Mode, Company)] 

# Type Mode Company Sum.Quantity 
#1: Shoe hello Nike   5 
#2: Jeans hello Levis   3 
#3: Shoe hello Adidas   1 
#4: Jeans hello Spykar   2 
#5: Shoe ahola Nike   1 
#6: Jeans ahola Levis   1 
dplyr

df %>% 
    group_by(Type, Mode, Company) %>% 
       summarise(sum(Quantity)) 

同样DATA

dput(df) 
structure(list(Quantity = c(1L, 1L, 2L, 3L, 1L, 1L, 2L, 1L, 1L 
), Type = structure(c(2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("Jeans", 
"Shoe"), class = "factor"), Mode = structure(c(2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L), .Label = c("ahola", "hello"), class = "factor"), 
    Company = structure(c(3L, 3L, 2L, 3L, 2L, 1L, 4L, 3L, 2L), .Label = c("Adidas", 
    "Levis", "Nike", "Spykar"), class = "factor")), .Names = c("Quantity", 
"Type", "Mode", "Company"), class = "data.frame", row.names = c(NA, 
-9L)) 
+0

我都试过,但它给我的错误是“总和不有意义的因素” – Looper

+0

HM ......它工作正常我的机器上...... – Sotos

+0

你可能有数量编码为'因子“变量。检查'类(数据$数量)',如果它是一个因素,尝试'as.integer',如果它给出警告,那么你要小心这些。 – asb

0

你不想“聚合字符串”,你想通过字符串变量来聚合数字。在这里:

R> xx = data.frame(a=sample(letters[1:3], 10, TRUE), 
        b=sample(LETTERS[1:3], 10, TRUE), 
        c=runif(10)) 
R> xx 
a b   c 
1 b C 0.7094221 
2 c B 0.2718095 
3 c B 0.8844701 
4 b C 0.9270141 
5 b C 0.8243021 
6 a A 0.3649902 
7 a B 0.9763228 
8 a A 0.8904676 
9 b C 0.8640352 
10 a A 0.7931683 
R> aggregate(c ~ a + b, data=xx, FUN=sum) 
a b   c 
1 a A 2.0486261 
2 a B 0.9763228 
3 c B 1.1562796 
4 b C 3.3247736