2016-07-23 24 views
1

数据:用数字替换用户名和计数订单

DB <- structure(list(orderItemID = 1:10, CustomerName = structure(c(1L, 
1L, 2L, 3L, 3L, 4L, 4L, 4L, 5L, 6L), .Label = c("Alex", "Bert", 
"Corel", "Dennis", "Edgar", "Fred"), class = "factor"), OrderID = structure(c(5L, 
6L, 1L, 2L, 2L, 8L, 7L, 7L, 4L, 3L), .Label = c("14", "17", "33", 
"56", "58", "62", "89", "9"), class = "factor"), ArticleDescription = structure(c(10L, 
5L, 1L, 7L, 8L, 3L, 4L, 2L, 9L, 6L), .Label = c("Adidas Jacket", 
"Adidas Shoes", "Aesics Shoes", "Boss Jeans", "Lee T-Shirt", 
"Nike Airs", "Nike Shoes", "Puma Backpack", "Puma Socks", "Wrangler Jeans" 
), class = "factor")), .Names = c("orderItemID", "CustomerName", 
"OrderID", "ArticleDescription"), row.names = c(NA, -10L), class = "data.frame") 

预期的结果:

output <- structure(list(orderItemID = 1:10, Name = structure(c(1L, 1L, 
2L, 3L, 3L, 4L, 4L, 4L, 5L, 6L), .Label = c("1", "2", "3", "4", 
"5", "6"), class = "factor"), NumberOfOrders = structure(c(1L, 
2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("1", "2"), class = "factor"), 
    ArticleDescription = structure(c(10L, 5L, 1L, 7L, 8L, 3L, 
    4L, 2L, 9L, 6L), .Label = c("Adidas Jacket", "Adidas Shoes", 
    "Aesics Shoes", "Boss Jeans", "Lee T-Shirt", "Nike Airs", 
    "Nike Shoes", "Puma Backpack", "Puma Socks", "Wrangler Jeans" 
    ), class = "factor")), .Names = c("orderItemID", "Name", 
"NumberOfOrders", "ArticleDescription"), row.names = c(NA, -10L 
), class = "data.frame") 

早上好!

这次我需要用1开头的数字替换CustomerName - 同一个名字应该有相同的数字 - 下一个名字应该有下一个更高的数字。此外,OrderID应该通过特定客户订购的订单数来重播 - 在这种情况下,当不同商品的订单ID相等时,它是一个订单(例如,Alex做了2个订单(在他订购的第一个订单中“Wrangler牛仔裤“,第二个是”Lee T-Shirt“);丹尼斯也做了2个订单(第一个订购了”Aesics Shoes“,第二个订购了”Boss Jeans“和”Adidas Shoes“)最后,我想继续使用dplyrArticleDescription不变

+0

请修复您的样品。他们会抛出错误 – Sotos

+0

现在我的最大希望是什么:/ – Jarvis

回答

0
library(dplyr) 

DB %>% mutate(Name = dense_rank(CustomerName), 
      No.of.Orders=(ifelse(is.na(OrderID !=lag(OrderID)), TRUE, (OrderID !=lag(OrderID)))*1)) %>% 
    group_by(CustomerName) %>% 
mutate(No.of.Orders = cumsum(No.of.Orders)) 
+0

它的工作 - 但最后一行(超出其他解决方案)不是 - 只想删除CustomerName和OrderID:你有解决方案吗? – Jarvis

+0

只需选择你想要的列。在%>%select(orderid,Name,No.of.orders) –

+0

后写下以下几乎完美的作品,但仍然向我展示了客户的名字......为什么? 2.如何保存它? – Jarvis

1

的一种方式,

library(dplyr) 
DB %>% 
    mutate(Name = as.integer(as.factor(CustomerName))) %>% 
    group_by(Name) %>% 
    mutate(No.of.Orders = data.table::rleid(OrderID)) %>% 
    select(-c(CustomerName, OrderID)) 

#Source: local data frame [10 x 4] 
#Groups: Name [6] 

# orderItemID ArticleDescription Name No.of.Orders 
#   (int)    (fctr) (int)  (int) 
#1   1  Wrangler Jeans  1   1 
#2   2  Lee T-Shirt  1   2 
#3   3  Adidas Jacket  2   1 
#4   4   Nike Shoes  3   1 
#5   5  Puma Backpack  3   1 
#6   6  Aesics Shoes  4   1 
#7   7   Boss Jeans  4   2 
#8   8  Adidas Shoes  4   2 
#9   9   Puma Socks  5   1 
#10   10   Nike Airs  6   1 
+1

'Name = as.integer(as.factor(CustomerName))'可能比要求data.table调用更简单。 –

0

你可以很容易地得到名称为

number_of_orders <- table(DB$CustomerName) 
name <- rep(1:length(unique(DB$CustomerName)), 
     number_of_orders) 

但我认为亚历克斯的建议更好。