2015-11-10 43 views
0

编辑: 原始数据集可以在这里找到:link重塑数据和强迫失踪零

我有这样一个矩阵:

data <- matrix(c("a","1","10", 
      "b","1","20", 
      "c","1","30", 
      "a","2","10", 
      "b","2","20", 
      "a","3","10", 
      "c","3","20"), 
      ncol=3, byrow=TRUE) 

我想重塑为一个数据框强迫失踪值归零:

data <- matrix(c("a","1","10", 
      "b","1","20", 
      "c","1","30", 
      "a","2","10", 
      "b","2","20", 
      "c","2","0", 
      "a","3","10", 
      "b","3","0", 
      "c","3","20"), 
      ncol=3, byrow=TRUE) 

我该如何使用重塑包装? Thaks

+1

? –

回答

1

我们可以将你的数据一点后使用complete从tidyr,:

library(tidyr) 
data <- as.data.frame(data) 
data$V3 <- as.numeric(as.character(data$V3)) 
complete(data, V1, V2, fill = list(V3 = 0)) 
+0

对我的数据说:在left_join_impl(x,y,by $ x,by $ y)中出错: 试图在SET_STRING_ELT中设置索引0/0。在我的真实数据集中,V3是(int)。以某种方式影响它? – xxxvinxxx

+0

我担心我分享的数据有问题,但我不明白是什么。这里是原始数据:[l​​ink](https://drive.google.com/file/d/0B4FnlzCZUFqWcHVVd3RXQnQwQzQ/view?usp=sharing) – xxxvinxxx

+0

从你的数据中,我想你想完成(数据,标签,计数,填写= list(unique_elements = 0))'? – jeremycg

1

tidyr更好,但如果你想使用reshape可以

library(reshape2) 

data2=dcast(data = as.data.frame(data),V1~V2) 
data3=melt(data2,measure.vars=colnames(data2)[-1]) 
data3[is.na(data3)]="0" 
1

我看来,像你处理像多元时间序列的东西。因此我建议使用适当的时间序列对象。

library(zoo) 
res=read.zoo(data.frame(data,stringsAsFactors=FALSE), 
     split=1, 
     index.column=2, 
     FUN=as.numeric) 
coredata(res)=as.numeric(coredata(res)) 
coredata(res)[is.na(res)]=0 

这给

res 
# a b c 
#1 10 20 30 
#2 10 20 0 
#3 10 0 20 
1

我认为你做错了由与多个类的矩阵。

首先,我将转换为data.framedata.table,然后将所有列转换为正确的类型。像

library(data.table) # V 1.9.6+ 
# Convert to data.table 
DT <- as.data.table(data) 

# Convert to correct column types 
for(j in names(DT)) set(DT, j = j, value = type.convert(DT[[j]])) 

东西然后我们就可以使用data.table::CJ扩大行和你为什么要使用`matrix`分配零至NA

## Cross join all column except the third 
DT <- DT[do.call(CJ, c(unique = TRUE, DT[, -3, with = FALSE])), on = names(DT)[-3]] 

## Or if you want only to operate on these two columns you can alternatively do 
# DT <- DT[CJ(V1, V2, unique = TRUE), on = c("V1", "V2")] 

## Fill with zeroes 
DT[is.na(V3), V3 := 0] 
DT 
# V1 V2 V3 
# 1: a 1 10 
# 2: a 2 10 
# 3: a 3 10 
# 4: b 1 20 
# 5: b 2 20 
# 6: b 3 0 
# 7: c 1 30 
# 8: c 2 0 
# 9: c 3 20 
+0

我担心我分享的数据有问题,但我不明白是什么。以下是原始数据:[l​​ink](https://drive.google.com/file/d/0B4FnlzCZUFqWcHVVd3RXQnQwQzQ/view?usp=sharing) – xxxvinxxx