2015-12-30 155 views
0

enter image description here从长格式转换为宽格式

想要将数据从长格式转换为宽格式。 ColA总体上只需要一行。在ColB中ColB会出现重复,在这种情况下,我试图通过计数来聚合它。 ColF由sum()汇总。

s <- read_csv("sample.csv") 
s_1 <- subset(s, select=c("ColA", "ColF")) 
grp_by <- aggregate(. ~ ColA , data = s_1, FUN = sum) 
head(grp_by) 

不知道如何将列

更新的其余部分:基础上建议利用reshape2包

library(reshape2) 

s <- read_csv("sample.csv") 
s_1 <- subset(s, select=c("ColA", "ColF")) 
grp_by <- aggregate(. ~ ColA , data = s_1, FUN = sum) 

s2 <- dcast(s, ColA ~ ColB) 
s3 <- dcast(s, ColA ~ ColC) 
s4 <- dcast(s, ColA ~ ColD) 
s5 <- dcast(s, ColA ~ ColE) 

print(s2) 
print(s3) 
print(s4) 
print(s5) 
print(grp_by) 

这是这些打印语句的输出。

enter image description here

我怎么能合并所有这些到一个数据帧?我的实际数据集是100万条记录 - 这个代码是否足够优化以便在其上运行,或者有更好的写入方式。感谢你的帮助。

+3

看看这里http://stackoverflow.com/questions/ 5890584 /重塑数据 - 从长到宽格式 - r –

+0

@DavidArenburg Than ks为您的建议。在使用reshape2之后更新了问题。你能否再次检查这个问题并适当地指导我。谢谢。 – prasanth

+1

在这里看到如何提供一个可重复的例子和所需的输出http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example –

回答

0

这是我用来转换和合并数据的示例代码。可能有更好的方法,但这是我能想到的最好的方法。

# Include needed libraries 
library(reshape2) 

# Load the sample data 
s <- read_csv("sample.csv") 

# Aggregate ColF by SUM for each ColA 
s_1 <- subset(s, select=c("ColA", "ColF")) 
grp_by <- aggregate(. ~ ColA , data = s_1, FUN = sum) 

# Long to Wide format 
s2 <- dcast(s, ColA ~ ColB) 
s3 <- dcast(s, ColA ~ ColC) 
s4 <- dcast(s, ColA ~ ColD) 
s5 <- dcast(s, ColA ~ ColE) 

# But this is the crude way of removing NA columns which I used! 
# Rename the NA column into something so that it can be removed by assigning NULL!! 
colnames(s2)[7] <- "RemoveMe" 
colnames(s3)[5] <- "RemoveMe" 
colnames(s4)[5] <- "RemoveMe" 
colnames(s5)[4] <- "RemoveMe" 

s2$RemoveMe <- NULL 
s3$RemoveMe <- NULL 
s4$RemoveMe <- NULL 
s5$RemoveMe <- NULL 

# Merge all pieces to form the final transformed data 
s2 <- merge(x = s2, y = s3, by="ColA", all = TRUE) 
s2 <- merge(x = s2, y = s4, by="ColA", all = TRUE) 
s2 <- merge(x = s2, y = s5, by="ColA", all = TRUE) 
s2 <- merge(x = s2, y = grp_by, by="ColA", all = TRUE) 

# Removing the row with user_id = NA!! 
s2 <- s2[-c(4), ] 

# Final transformed data 
print(s2) 

使用这些作为参考:

  1. dcast - How to reshape data from long to wide format?
  2. 合并 - How to join (merge) data frames (inner, outer, left, right)?
相关问题