2013-10-22 135 views
0

我写了一个程序,它可以找到一个单独的txt文件中的大数据集的平均值和标准偏差。我希望这个程序能够与任何数据集一起工作。我通过把两个简单的数据点(年和月相关的温度)测试我的程序:如何查找红宝石的平均值和标准偏差?

 
2009-11,20 
2009-12,10 

当运行这个它说,我的平均值为20,标准差为0(显然是错误的)。

这里是我的程序:

data = File.open("test.txt", "r+") 
contents = data.read 

contents = contents.split("\r\n") 

#split up array 
contents.collect! do |x| 
    x.split(',') 
end 

sum = 0 

contents.each do |x| 
    #make loop to find average 
    sum = sum + x[1].to_f 
end 
avg = sum/contents.length 
puts "The average of your large data set is: #{ avg.round(3)} (Answer is rounded to nearest thousandth place)" 
#puts average 

#similar to finding average, this finds the standard deviation 
variance = 0 
contents.each do |x| 
    variance = variance + (x[1].to_f - avg)**2 
end 

variance = variance/contents.length 
variance = Math.sqrt(variance) 
puts "The standard deviation of your large data set is:#{ variance.round(3)} (Answer is rounded to nearest thousandth place)" 
+0

的可能重复[我怎样才能做到标准偏差在Ruby中?(http://stackoverflow.com/questions/7749568/how-can-i-do-standard-deviation-in-ruby) –

+1

你确定行分隔符是'“\ r \ n”'?试着用这个替换这一行:'contents = contents.split(/ [\ r \ n] + /)' –

+0

谢谢你这个作品 – user2759592

回答

1

我认为问题来自使用\r\n这取决于操作系统,分割数据:如果你是在Linux上,它应该是contents.split('\n')。无论如何,使用IO#each来遍历文件中的每一行并让Ruby处理行尾字符会更好。

data = File.open("test.txt", "r+") 

count = 0 
sum = 0 
variance = 0 

data.each do |line| 
    value = line.split(',')[1] 
    sum = sum + value.to_f 
    count += 1 
end 

avg = sum/count 
puts "The average of your large data set is: #{ avg.round(3)} (Answer is rounded to nearest thousandth place)" 

# We need to get back to the top of the file 
data.rewind 

data.each do |line| 
    value = line.split(',')[1] 
    variance = variance + (value.to_f - avg)**2 
end 

variance = variance/count 
variance = Math.sqrt(variance) 
puts "The standard deviation of your large data set is: #{ variance.round(3)} (Answer is rounded to nearest thousandth place)"