2013-04-10 61 views
0

这真的很基本,但我陷入了过于复杂的代码。我有一个CSV文件,包含一列测试,一列标记和一列学生。我想重新格式化数据,以便我有一排学生标记和测试列。初学者重新排列csv文件中的数据

我创建了一个单独的csv,其中包含学生(如数字代码),名为“students.csv”,因为现在这很容易。

我有52名学生和50个测试。

我可以得到下面的与单个学生的工作:

matricNumbers <- read.csv("students.csv") 
students <- as.vector(as.matrix(matricNumbers)) 
students 
data <- read.csv("marks.csv") 
studentSubset <- data[data[2] == 1150761,] 
marksSubset <- as.vector(as.matrix(studentSubset[5])) 
ll <- list() 
ll<-c(list(marksSubset), ll) 
dd<-data.frame(matrix(nrow=50,ncol=50)) 
for(i in 1:length(ll)){ 
    dd[i,] <- ll[[i]] 

} 
dd 

,但我似乎无法得到这个与for循环工作,要经过每一个学生。

getMarks <-function(studentNumFile,markFile){ 

matricNumbers <- read.csv(studentNumFile) 
students <- as.vector(as.matrix(matricNumbers)) 


data <- read.csv(markFile) 

for (i in seq_along(students)){ 
    studentSubset <- data[data[2] == i,] 
    marksSubset <- as.vector(as.matrix(studentSubset[5])) 
    ll <- list() 
    ll<-c(list(marksSubset), ll) 
    dd<-data.frame(matrix(nrow=52,ncol=50)) 
    for(i in 1:length(ll)){ 
     dd[i,] <- ll[[i]] 
    } 
} 
return(dd) 
} 

getMarks("students.csv","marks.csv") 

我收到错误:

Error in `[<-.data.frame`(`*tmp*`, i, , value = logical(0)) : replacement has 0 items, need 50 

我相信这是由于嵌套循环for但我无法弄清楚如何以其他方式做到这一点。

+0

当我停止时,“i”的值是多少?这应该是导致错误的那个人。你能展示那个子集吗?另外,你是否尝试用'j'替换嵌套循环中的'i'以获得清晰? – 2013-04-10 13:04:50

回答

1

如果我正确理解问题,则可以使用reshape包实现所需。由于您不提供样本数据,因此很难进行测试。我建议你将dput(head(matricNumbers))的输出粘贴到上面的代码块中。

但是,你应该能够遵循这个简单的例子,我用一些虚拟数据。我想你可能只需要一行,而且你可以忘记所有复杂的循环的东西!

# These lines make some dummy data, similar to you matricNumbers (hopefully) 
test = sort(sample(c("Biology","Maths","Chemistry") , 10 , repl = TRUE)) 
students = unlist(sapply(table(test), function(x) { sample(letters[1:x] , x) })) 
names(students) <- NULL 
scores <- data.frame(test , mark = sample(40:100 , 10 , repl = TRUE) , students) 
scores 
     test mark students 
1 Biology 50  c 
2 Biology 93  a 
3 Biology 83  b 
4 Biology 83  d 
5 Chemistry 71  b 
6 Chemistry 54  c 
7 Chemistry 54  a 
8  Maths 97  c 
9  Maths 93  b 
10  Maths 72  a 



# Then use reshape to cast your data into the format you require 
# I use 'mean' as the aggregation function. If you have one score for each student/test, then mean will just return the score 
# If you do not have a score for a particular student in that test then it will return NaN 
require(reshape) 
bystudent <- cast(scores , students ~ test , value = "mark" , mean) 
bystudent 
    students Biology Chemistry Maths 
1  a  93  54 72 
2  b  83  71 93 
3  c  50  54 97 
4  d  83  NaN NaN 
+0

完美,这真是太容易了!谢谢! – EnduroDave 2013-04-10 15:25:01