2013-04-29 36 views
1

我刚刚从我们的数据记录器中下载了大量温度数据。数据框给出了87个温度传感器每小时平均观测1691个小时的温度(因此这里有很多数据)。这看起来是这样的融化并重新生成r中的新数据帧

D1_A  D1_B  D1_C 
13.43 14.39 12.33 
12.62 13.53 11.56 
11.67 12.56 10.36 
10.83 11.62 9.47 

我想这个数据集重塑成看起来像这样一个矩阵:

#create a blank matrix 5 columns 131898 rows 
matrix1<-matrix(nrow=131898, ncol=5) 
colnames(matrix1)<- c("year", "ID", "Soil_Layer", "Hour", "Temperature") 

其中:

year is always "2012" 
ID corresponds to the header ID (e.g. D1) 
Soil_Layer corresponds to the second bit of the header (e.g. A, B, or C) 
Hour= 1:1691 for each sensor 
and Temperature= the observed values in the original dataframe. 

可这是用r中的重塑包完成?这是否需要循环完成?关于如何处理这个数据集的任何输入都是有用的。干杯!

+0

131898从哪里来? 1691 * 87 = 147117。 – Chase 2013-04-30 00:10:32

回答

2

我想这你想要做什么......你可以利用的colsplit()melt()功能包reshape2。目前还不清楚在哪里确定数据的Hour,所以我假定它是从原始数据集中排序的。如果情况并非如此,请更新您的问题:

library(reshape2) 
#read in your data 
x <- read.table(text = " 

    D1_A D1_B D1_C 
    13.43 14.39 12.33 
    12.62 13.53 11.56 
    11.67 12.56 10.36 
    10.83 11.62 9.47 
    9.98 10.77 9.04 
    9.24 10.06 8.65 
    8.89 9.55 8.78 
    9.01 9.39 9.88 
", header = TRUE) 

#add hour index, if data isn't ordered, replace this with whatever 
#tells you which hour goes where 
x$hour <- 1:nrow(x) 
#Melt into long format 
x.m <- melt(x, id.vars = "hour") 
#Split into two columns 
x.m[, c("ID", "Soil_Layer")] <- colsplit(x.m$variable, "_", c("ID", "Soil_Layer")) 
#Add the year 
x.m$year <- 2012 

#Return the first 6 rows 
head(x.m[, c("year", "ID", "Soil_Layer", "hour", "value")]) 
#---- 
    year ID Soil_Layer hour value 
1 2012 D1   A 1 13.43 
2 2012 D1   A 2 12.62 
3 2012 D1   A 3 11.67 
4 2012 D1   A 4 10.83 
5 2012 D1   A 5 9.98 
6 2012 D1   A 6 9.24