2017-07-11 24 views

我正尝试使用格式不正确的SAP数据。r - 根据下一行将值移到新列



ID Status Product  Profile Description 
154 NOCO 3000  A1  failure   
215 ATCO 4000     dfect  
164 NOCO 2000  A1  dfect 
164      A2 
875 ATCO 3000     failure 
548 NOCO 2000  A1  dfect   
548      A2 
548      A3 
797 NOCO 3000     failure  
444 ATCO 4000     failure  


ID Status Product Profile Profile2 Profile3 Description 
154 NOCO 3000 A1        failure 
215 ATCO 4000         dfect 
164 NOCO 2000 A1  A2      dfect 
875 ATCO 3000         failure 
548 NOCO 2000 A1  A2   A3   dfect 
797 NOCO 3000         failure 
444 ATCO 4000         failure 





structure(list(ID = c(154L, 215L, 164L, 164L, 875L, 548L, 548L, 
548L, 797L, 444L), Status = structure(c(3L, 2L, 3L, 1L, 2L, 3L, 
1L, 1L, 3L, 2L), .Label = c("", "ATCO", "NOCO"), class = "factor"), 
    Product = c(3000L, 4000L, 2000L, NA, 3000L, 2000L, NA, NA, 
    3000L, 4000L), Profile = structure(c(2L, 1L, 2L, 3L, 1L, 
    2L, 3L, 4L, 1L, 1L), .Label = c("", "A1", "A2", "A3"), class = "factor"), 
Description = structure(c(3L, 2L, 2L, 1L, 3L, 2L, 1L, 1L, 
3L, 3L), .Label = c("", "dfect", "failure"), class = "factor")), .Names = c("ID", 
"Status", "Product", "Profile", "Description"), class = "data.frame", row.names = c(NA, 

请提供的数据易于粘贴表格,然后让你的努力。 –


[重新整形数据从长格式到宽格式]的可能重复(http://stackoverflow.com/questions/5890584)? – zx8754


@ zx8754我认为这个问题与你所链接的问题不同,因为它涉及缺失的值,也应该纳入 –




df[df==""] <- NA #change your blanks to NAs 
df2 <- df %>% fill(-ID) %>% #fill down missing values 
       spread(key=Profile, value=Profile, sep="", fill="") #convert to wide format 

    ID Status Product Description ProfileA1 ProfileA2 ProfileA3 
1 154 NOCO 3000  failure  A1      
2 164 NOCO 2000  dfect  A1  A2   
3 215 ATCO 4000  dfect  A1      
4 444 ATCO 4000  failure       A3 
5 548 NOCO 2000  dfect  A1  A2  A3 
6 797 NOCO 3000  failure       A3 
7 875 ATCO 3000  failure     A2   

我们可以使用'tidyr :: fill()'而不是'zoo :: na.locf()'保留在一个包中。 – zx8754


@ zx8754谢谢 - 我已经更新了答案。很整洁! –


@AndrewGustar这是完美的! – Markus



data = structure(list(ID = c(154L, 215L, 164L, 164L, 875L, 548L, 548L, 
        548L, 797L, 444L), Status = structure(c(3L, 2L, 3L, 1L, 2L, 3L, 
                  1L, 1L, 3L, 2L), .Label = c("", "ATCO", "NOCO"), class = "factor"), 
      Product = c(3000L, 4000L, 2000L, NA, 3000L, 2000L, NA, NA, 
         3000L, 4000L), Profile = structure(c(2L, 1L, 2L, 3L, 1L, 
                  2L, 3L, 4L, 1L, 1L), .Label = c("", "A1", "A2", "A3"), class = "factor"), 
      Description = structure(c(3L, 2L, 2L, 1L, 3L, 2L, 1L, 1L, 
            3L, 3L), .Label = c("", "dfect", "failure"), class = "factor")), .Names = c("ID", 
                               "Status", "Product", "Profile", "Description"), class = "data.frame", row.names = c(NA, 

new.data = data[,c("ID","Status","Product","Description")] 
new.data = new.data[-which(new.data$Status==""),] 
for(i in 1:3){ 
    new.data[[paste0("Profile",i)]] = NA 
for(i in 1:3){ 
    for(id in new.data$ID){ 
    new.data[which(new.data$ID==id),paste0("Profile",i)] = 

这将产生data.frame new.data

ID Status Product Description Profile1 Profile2 Profile3 
1 154 NOCO 3000  failure  A1     
2 215 ATCO 4000  dfect       
3 164 NOCO 2000  dfect  A1  A2   
5 875 ATCO 3000  failure       
6 548 NOCO 2000  dfect  A1  A2  A3 
9 797 NOCO 3000  failure       
10 444 ATCO 4000  failure 

我喜欢使用没有包的代码。但是,对于所有配置文件选项,这将返回NA。 – Markus


我再次尝试一次,这次是使用您提供的数据结构。它工作没有任何问题。你是否复制并粘贴了整个代码?因为您的输出将在没有第二个循环块的情况下生成。 –