2017-07-11 24 views
1

我正尝试使用格式不正确的SAP数据。r - 根据下一行将值移到新列

在此源数据中,当其中一个变量(示例中的“配置文件”)具有更多条目时,它们将被堆叠。这会在下一行中创建一个空的观察值,只有相同的“ID”。

这样:

ID Status Product  Profile Description 
154 NOCO 3000  A1  failure   
215 ATCO 4000     dfect  
164 NOCO 2000  A1  dfect 
164      A2 
875 ATCO 3000     failure 
548 NOCO 2000  A1  dfect   
548      A2 
548      A3 
797 NOCO 3000     failure  
444 ATCO 4000     failure  

我想要做的是将这些堆叠的值,并将其移动到下一列。

ID Status Product Profile Profile2 Profile3 Description 
154 NOCO 3000 A1        failure 
215 ATCO 4000         dfect 
164 NOCO 2000 A1  A2      dfect 
875 ATCO 3000         failure 
548 NOCO 2000 A1  A2   A3   dfect 
797 NOCO 3000         failure 
444 ATCO 4000         failure 

我该怎么做呢?

谢谢!

编辑:

上面的第一个表的新增dput:

structure(list(ID = c(154L, 215L, 164L, 164L, 875L, 548L, 548L, 
548L, 797L, 444L), Status = structure(c(3L, 2L, 3L, 1L, 2L, 3L, 
1L, 1L, 3L, 2L), .Label = c("", "ATCO", "NOCO"), class = "factor"), 
    Product = c(3000L, 4000L, 2000L, NA, 3000L, 2000L, NA, NA, 
    3000L, 4000L), Profile = structure(c(2L, 1L, 2L, 3L, 1L, 
    2L, 3L, 4L, 1L, 1L), .Label = c("", "A1", "A2", "A3"), class = "factor"), 
Description = structure(c(3L, 2L, 2L, 1L, 3L, 2L, 1L, 1L, 
3L, 3L), .Label = c("", "dfect", "failure"), class = "factor")), .Names = c("ID", 
"Status", "Product", "Profile", "Description"), class = "data.frame", row.names = c(NA, 
-10L)) 
+0

请提供的数据易于粘贴表格,然后让你的努力。 –

+0

[重新整形数据从长格式到宽格式]的可能重复(http://stackoverflow.com/questions/5890584)? – zx8754

+0

@ zx8754我认为这个问题与你所链接的问题不同,因为它涉及缺失的值,也应该纳入 –

回答

1

你可以用tidyr做到这一点...

require(tidyr) 
df[df==""] <- NA #change your blanks to NAs 
df2 <- df %>% fill(-ID) %>% #fill down missing values 
       spread(key=Profile, value=Profile, sep="", fill="") #convert to wide format 

df2 
    ID Status Product Description ProfileA1 ProfileA2 ProfileA3 
1 154 NOCO 3000  failure  A1      
2 164 NOCO 2000  dfect  A1  A2   
3 215 ATCO 4000  dfect  A1      
4 444 ATCO 4000  failure       A3 
5 548 NOCO 2000  dfect  A1  A2  A3 
6 797 NOCO 3000  failure       A3 
7 875 ATCO 3000  failure     A2   
+1

我们可以使用'tidyr :: fill()'而不是'zoo :: na.locf()'保留在一个包中。 – zx8754

+0

@ zx8754谢谢 - 我已经更新了答案。很整洁! –

+0

@AndrewGustar这是完美的! – Markus

0

其中没有任何包装工作的一个版本。但zoo/tidyr的答案更优雅。

data = structure(list(ID = c(154L, 215L, 164L, 164L, 875L, 548L, 548L, 
        548L, 797L, 444L), Status = structure(c(3L, 2L, 3L, 1L, 2L, 3L, 
                  1L, 1L, 3L, 2L), .Label = c("", "ATCO", "NOCO"), class = "factor"), 
      Product = c(3000L, 4000L, 2000L, NA, 3000L, 2000L, NA, NA, 
         3000L, 4000L), Profile = structure(c(2L, 1L, 2L, 3L, 1L, 
                  2L, 3L, 4L, 1L, 1L), .Label = c("", "A1", "A2", "A3"), class = "factor"), 
      Description = structure(c(3L, 2L, 2L, 1L, 3L, 2L, 1L, 1L, 
            3L, 3L), .Label = c("", "dfect", "failure"), class = "factor")), .Names = c("ID", 
                               "Status", "Product", "Profile", "Description"), class = "data.frame", row.names = c(NA, 


new.data = data[,c("ID","Status","Product","Description")] 
new.data = new.data[-which(new.data$Status==""),] 
for(i in 1:3){ 
    new.data[[paste0("Profile",i)]] = NA 
} 
for(i in 1:3){ 
    for(id in new.data$ID){ 
    new.data[which(new.data$ID==id),paste0("Profile",i)] = 
     ifelse(sum(data[which(data$ID==id),"Profile"]== 
       paste0("A",i))>0,paste0("A",i),"") 
    } 
} 

这将产生data.frame new.data

ID Status Product Description Profile1 Profile2 Profile3 
1 154 NOCO 3000  failure  A1     
2 215 ATCO 4000  dfect       
3 164 NOCO 2000  dfect  A1  A2   
5 875 ATCO 3000  failure       
6 548 NOCO 2000  dfect  A1  A2  A3 
9 797 NOCO 3000  failure       
10 444 ATCO 4000  failure 
+0

我喜欢使用没有包的代码。但是,对于所有配置文件选项,这将返回NA。 – Markus

+0

我再次尝试一次,这次是使用您提供的数据结构。它工作没有任何问题。你是否复制并粘贴了整个代码?因为您的输出将在没有第二个循环块的情况下生成。 –