R中

2015-04-23 54 views
2
计算唯一值交叉表

我的数据由3列: - 段 - 类别 - 产品数量R中

如何创建一个交叉选项卡(” ‘栏目,’类别“作为行),计算”产品号“(下面的示例)的唯一值?

 SEG1 SEG2 SEG3 
CAT1 X 
CAT2 
CAT3 

X - 和SEG1

从CAT1唯一值的数量Exmple数据

CAT<-c("CAT1","CAT3","CAT3","CAT1","CAT2","CAT3","CAT3","CAT3","CAT3","CAT2") 
SEG<-c("SEG1","SEG3","SEG3","SEG2","SEG2","SEG2","SEG3","SEG3","SEG2","SEG2") 
PRODUCT<-c("a","a","a","a","d","e","b","c","a","a") 
data<-cbind(CAT,SEG,PRODUCT) 

许多在此先感谢! 最好的问候, 鲍尔泰克

+0

'table(CAT1,SEG1)'? – User7598

+0

我需要独特的产品编号... :) – haver24

+0

你可以输入你的数据(或其样本)吗?所以它会更容易处理它 – Cath

回答

0
> set.seed(1) 
> mydf <- data.frame(
+  Values = rep(c("111", "222", "333"), times = c(5, 3, 2)), 
+  Year = c(rep(c("1999", "2000"), times = c(3, 2)), 
+   "1999", "1999", "2000", "2000", "2000"), 
+  Month = sample(c("Jan", "Feb", "Mar"), 10, replace = TRUE) 
+) 
> mydf 
    Values Year Month 
1  111 1999 Jan 
2  111 1999 Feb 
3  111 1999 Feb 
4  111 2000 Mar 
5  111 2000 Jan 
6  222 1999 Mar 
7  222 1999 Mar 
8  222 2000 Feb 
9  333 2000 Feb 
10 333 2000 Jan 
> with(mydf, tapply(Month, list(Values, Year), FUN = function(x) length(unique(x)))) 
    1999 2000 
111 2 2 
222 1 1 
333 NA 2 
> 

对于示例:

> data 
    CAT SEG PRODUCT 
1 CAT1 SEG1  a 
2 CAT3 SEG3  a 
3 CAT3 SEG3  a 
4 CAT1 SEG2  a 
5 CAT2 SEG2  d 
6 CAT3 SEG2  e 
7 CAT3 SEG3  b 
8 CAT3 SEG3  c 
9 CAT3 SEG2  a 
10 CAT2 SEG2  a 
> with(data, tapply(PRODUCT, list(CAT, SEG), FUN = function(x) length(unique(x)))) 
    SEG1 SEG2 SEG3 
CAT1 1 1 NA 
CAT2 NA 2 NA 
CAT3 NA 2 3 
+0

谢谢。我真的很接近!祝你今天愉快 ! – haver24

+0

@ haver24接受这个解决方案,如果它解决的目的 – RUser

2

你可以简单地计算你的数据,但没有重复线交叉表,以确保只算独一无二的产品编号:

nodup <- which(!duplicated(data)) 
table(data[nodup, "CAT"],data[nodup, "SEG"]) 

     SEG1 SEG2 SEG3 
    CAT1 1 1 0 
    CAT2 0 2 0 
    CAT3 0 2 3 
+1

这是一个很好的,但可能创建索引一次,而不是计算两次“重复(数据)”? –

0
library(plyr) 
library(reshape) 
data <- data.frame(data) 
a <- ddply(data,.(CAT,SEG),summarize,unq=length(unique(PRODUCT))) 
b <- cast(a,CAT~SEG,mean) 

这将在哪些地方产生NaN唯一值的计数= 0

0

如果您使用的是data.table,您可以真正加快操作以获得更大的数据帧。您可以使用

library(data.table) 
    library(reshape) 
    DF<-data.table(DF) 
    DF_agg<-DF[,j=list(count_prod=length(unique(DF$Product_Number)),by=c("Segment","Category")] 
    DF_agg<-cast(DF_agg,Segment~Category,sum) 
0

dplyr和tidyr软件包的高速解决方案。

library(dplyr) 
library(tidyr) 


CAT <- c("CAT1","CAT3","CAT3","CAT1","CAT2","CAT3","CAT3","CAT3","CAT3","CAT2") 
SEG <- c("SEG1","SEG3","SEG3","SEG2","SEG2","SEG2","SEG3","SEG3","SEG2","SEG2") 
PRODUCT <- c("a","a","a","a","d","e","b","c","a","a") 
data <- data.frame(CAT, SEG, PRODUCT) 

# Elegant solution with pipes (%>%) 
data %>% 
    group_by(CAT, SEG) %>% 
    summarize(uni.prod = n_distinct(PRODUCT)) %>% 
    spread(CAT, uni.prod) 

# Solution without use pipes 
groups <- group_by(data, CAT, SEG) 
s <- summarize(groups, uni.prod = n_distinct(PRODUCT)) 
spread(s, CAT, uni.prod)