2016-04-21 49 views
2

我有一个购物车的数据,看起来像下面的示例数据框:收集多列与tidyr

sample_df<-data.frame(
    clientid=1:10, 
    ProductA=c("chair","table","plate","plate","table","chair","table","plate","chair","chair"), 
    QuantityA=c(1,2,1,1,1,1,2,3,1,2), 
    ProductB=c("table","doll","shoes","","door","","computer","computer","","plate"), 
    QuantityB=c(3,1,2,"",2,"",1,1,"",1) 
) 
#sample data frame 
    clientid ProductA QuantityA ProductB QuantityB 
1 1  chair 1   table 3 
2 2  table 2   doll 1 
3 3  plate 1   shoes 2    
4 4  plate 1    
... 
10 10  chair 2   plate 1 

我想将其转换成不同的格式,这将是这样的:

#ideal data frame 
    clientid ProductNumber Product Quantity 
1 1  A    chair 1 
2 1  B    table 3 
3 2  A    table 2 
4 2  B    doll 1 
... 
11 6  A    chair 1 
... 
17 10  A    chair 2 
18 10  B    plate 1 

我试图

library(tidyr) 
sample_df_gather<- sample_df %>% select(clientid, ProductA, ProductB) 
%>% gather(ProductNumber, value, -clientid) %>% filter(!is.na(value)) 

#this gives me 
    clientid ProductNumber value 
1 1  ProductA  chair 
2 2  ProductB  table 
3 3  ProductA  plate 
4 4  ProductB  plate 
... 

不过,我不知道该怎么数量添加到数据帧。另外,在实际的数据框架中,还有更多的栏目,例如标题,价格,我希望将其转换为理想的数据框架。有没有办法将数据转换为理想的格式?

+0

对于QuantityB,你真的不想用“”......试试NA。 – Frank

+1

'reshape(sample_df,dir ='long',vary = list(c(2,4),c(3,5)))'给了我20行或是错误的 – rawr

+1

谢谢@Frank!这里提供的重塑功能解决了我的问题。 @aosmith,是的,在我问这个问题之前,我已经检查过它,但仍然无法找到一种方法将我转换为理想的数据框架。 –

回答

6

随着data.table:

library(data.table) 
res = melt(setDT(sample_df), 
    measure.vars = patterns("^Product", "^Quantity"), 
    variable.name = "ProductNumber") 
res[, ProductNumber := factor(ProductNumber, labels = c("A","B"))] 

这给

clientid ProductNumber value1 value2 
1:  1    A chair  1 
2:  2    A table  2 
3:  3    A plate  1 
4:  4    A plate  1 
5:  5    A table  1 
6:  6    A chair  1 
7:  7    A table  2 
8:  8    A plate  3 
9:  9    A chair  1 
10:  10    A chair  2 
11:  1    B table  3 
12:  2    B  doll  1 
13:  3    B shoes  2 
14:  4    B  NA  NA 
15:  5    B  door  2 
16:  6    B  NA  NA 
17:  7    B computer  1 
18:  8    B computer  1 
19:  9    B  NA  NA 
20:  10    B plate  1 

数据(因为OP的原始数据borked):

structure(list(clientid = 1:10, ProductA = structure(c(1L, 3L, 
2L, 2L, 3L, 1L, 3L, 2L, 1L, 1L), .Label = c("chair", "plate", 
"table"), class = "factor"), QuantityA = c(1L, 2L, 1L, 1L, 1L, 
1L, 2L, 3L, 1L, 2L), ProductB = structure(c(6L, 2L, 5L, NA, 3L, 
NA, 1L, 1L, NA, 4L), .Label = c("computer", "doll", "door", "plate", 
"shoes", "table"), class = "factor"), QuantityB = c(3L, 1L, 2L, 
NA, 2L, NA, 1L, 1L, NA, 1L)), .Names = c("clientid", "ProductA", 
"QuantityA", "ProductB", "QuantityB"), row.names = c(NA, -10L 
), class = "data.frame") 
+0

听起来像OP只对tidyr感兴趣,但这可能会引起其他人的兴趣。 – Frank