2013-07-03 40 views
16

具有双值的两个阵列,我想计算相关系数(单,双值,就像在MS Excel中CORREL功能)。 C#中有一些简单的单行解决方案吗?相关的两个阵列的

我已经发现了称为元Numerics的数学库。根据this SO question,它应该完成这项工作。 Here是Meta Numerics相关方法的文档,我不明白。

能请别人为我提供了简单的代码片段或例子,如何使用图书馆?

注意:最后,我不得不使用自定义实现之一。 但如果有人读这个问题知道好,有据可查的C# 数学库/框架,要做到这一点,请不要犹豫,在 答案张贴链接。

+1

这也许可以帮助你也http://www.codeproject.com/Articles/8750/A-computational-statistics这是用于相关系数的代码http://www.functionx.com/vcsharp/applications/lcc.htm – terrybozzio

+0

有一个来自http://ta-lib.org/的库,它有“CORREL”函数。它非常易于使用,并且可以为您提供与excel相同的结果。它像Excel一样返回结果数组而不是单个值。 –

回答

25

您可以在同一指数在单独的列表中的值,并使用一个简单的Zip

var fitResult = new FitResult(); 
var values1 = new List<int>(); 
var values2 = new List<int>(); 

var correls = values1.Zip(values2, (v1, v2) => 
             fitResult.CorrelationCoefficient(v1, v2)); 

第二种方式是写自己的定制实现(我是不是速度优化):

public double ComputeCoeff(double[] values1, double[] values2) 
{ 
    if(values1.Length != values2.Length) 
     throw new ArgumentException("values must be the same length"); 

    var avg1 = values1.Average(); 
    var avg2 = values2.Average(); 

    var sum1 = values1.Zip(values2, (x1, y1) => (x1 - avg1) * (y1 - avg2)).Sum(); 

    var sumSqr1 = values1.Sum(x => Math.Pow((x - avg1), 2.0)); 
    var sumSqr2 = values2.Sum(y => Math.Pow((y - avg2), 2.0)); 

    var result = sum1/Math.Sqrt(sumSqr1 * sumSqr2); 

    return result; 
} 

用法:

var values1 = new List<double> { 3, 2, 4, 5 ,6 }; 
var values2 = new List<double> { 9, 7, 12 ,15, 17 }; 

var result = ComputeCoeff(values1.ToArray(), values2.ToArray()); 
// 0.997054485501581 

Debug.Assert(result.ToString("F6") == "0.997054"); 

另一种方法是使用Excel直接功能:

var values1 = new List<double> { 3, 2, 4, 5 ,6 }; 
var values2 = new List<double> { 9, 7, 12 ,15, 17 }; 

// Make sure to add a reference to Microsoft.Office.Interop.Excel.dll 
// and use the namespace 

var application = new Application(); 

var worksheetFunction = application.WorksheetFunction; 

var result = worksheetFunction.Correl(values1.ToArray(), values2.ToArray()); 

Console.Write(result); // 0.997054485501581 
+0

+1感谢您提供代码示例,并阐明了库的工作原理!问题是它只适用于ints而不是double的数组。当然不是你的错,但我不能标记为答案。 – teejay

+0

是的,我没有看到参数是'int'类型。如果您需要使用双打,那么您可能需要为它编写自己的扩展方法。 – Romoku

+0

如果你看看这个类的[source](http://metanumerics.codeplex.com/SourceControl/latest#Numerics/Core/Statistics/FitResult.cs),你会发现它使用矩阵来计算相关性系数,所以你可以模仿它。 – Romoku

5

如果您不想使用第三方库,您可以使用this post中的方法(在此处发布代码进行备份)。

double[] array1 = { 3, 2, 4, 5, 6 }; 
double[] array2 = { 9, 7, 12, 15, 17 }; 

double correl = Correlation(array1, array2); 

public double Correlation(double array1, double array2) 
{ 
    double[] array_xy = new double[array1.Length]; 
    double[] array_xp2 = new double[array1.Length]; 
    double[] array_yp2 = new double[array1.Length]; 
    for (int i = 0; i &lt; array1.Length; i++) 
     array_xy[i] = array1[i] * array2[i]; 
    for (int i = 0; i &lt; array1.Length; i++) 
     array_xp2[i] = Math.Pow(array1[i], 2.0); 
    for (int i = 0; i &lt; array1.Length; i++) 
     array_yp2[i] = Math.Pow(array2[i], 2.0); 
    double sum_x = 0; 
    double sum_y = 0; 
    foreach (double n in array1) 
     sum_x += n; 
    foreach (double n in array2) 
     sum_y += n; 
    double sum_xy = 0; 
    foreach (double n in array_xy) 
     sum_xy += n; 
    double sum_xpow2 = 0; 
    foreach (double n in array_xp2) 
     sum_xpow2 += n; 
    double sum_ypow2 = 0; 
    foreach (double n in array_yp2) 
     sum_ypow2 += n; 
    double Ex2 = Math.Pow(sum_x, 2.00); 
    double Ey2 = Math.Pow(sum_y, 2.00); 

    return (array1.Length * sum_xy - sum_x * sum_y)/
    Math.Sqrt((array1.Length * sum_xpow2 - Ex2) * (array1.Length * sum_ypow2 - Ey2)); 
} 
7

为了计算皮尔逊积矩相关系数

http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

你可以使用这个简单的代码:

public static Double Correlation(Double[] Xs, Double[] Ys) { 
    Double sumX = 0; 
    Double sumX2 = 0; 
    Double sumY = 0; 
    Double sumY2 = 0; 
    Double sumXY = 0; 

    int n = Xs.Length < Ys.Length ? Xs.Length : Ys.Length; 

    for (int i = 0; i < n; ++i) { 
     Double x = Xs[i]; 
     Double y = Ys[i]; 

     sumX += x; 
     sumX2 += x * x; 
     sumY += y; 
     sumY2 += y * y; 
     sumXY += x * y; 
    } 

    Double stdX = Math.Sqrt(sumX2/n - sumX * sumX/n/n); 
    Double stdY = Math.Sqrt(sumY2/n - sumY * sumY/n/n); 
    Double covariance = (sumXY/n - sumX * sumY/n/n); 

    return covariance/stdX/stdY; 
    } 
15

Math.NET Numerics的是一个包含相关类证据充分的数学库。它计算皮尔森和斯皮尔曼排名的相关性:http://numerics.mathdotnet.com/api/MathNet.Numerics.Statistics/Correlation.htm

该库可下的非常宽松的MIT/X11许可。使用它来计算相关系数非常简单,如下所示:

using MathNet.Numerics.Statistics; 

... 

correlation = Correlation.Pearson(arrayOfValues1, arrayOfValues2); 

祝你好运!

+0

感谢您的链接!这可能实际上是迄今为止最好的库,方法的使用真的不会更容易:-) – teejay

+0

作为一个更新,Math.NET Numerics 3.5版添加了一种方法来相关类来计算加权皮尔逊相关性。 –

0

在我的测试中,@Dmitry Bychenko和@ keyboardP的上述代码发布通常与Microsoft Excel通过几次手动测试产生相同的相关性,并且不需要任何外部库。

例如运行此一次(数据为这个运行在底部列出):

@Dmitry Bychenko:-0。00418479432051121

@keyboardP:______- 0.00418479432051131

MS Excel中:_________- 0.004184794

下面是测试线束:

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 

namespace TestCorrel { 
    class Program { 

     static void Main(string[] args) { 

      Random rand = new Random(DateTime.Now.Millisecond); 

      List<double> x = new List<double>(); 
      List<double> y = new List<double>(); 

      for (int i = 0; i < 100; i++) { 

       x.Add(rand.Next(1000) * rand.NextDouble()); 
       y.Add(rand.Next(1000) * rand.NextDouble()); 

       Console.WriteLine(x[i] + "," + y[i]); 
      } 

      Console.WriteLine("Correl1: " + Correl1(x, y)); 
      Console.WriteLine("Correl2: " + Correl2(x, y)); 
     } 

     public static double Correl1(List<double> x, List<double> y) { 

      //https://stackoverflow.com/questions/17447817/correlation-of-two-arrays-in-c-sharp 
      if (x.Count != y.Count) 
       return (double.NaN); //throw new ArgumentException("values must be the same length"); 

      double sumX = 0; 
      double sumX2 = 0; 
      double sumY = 0; 
      double sumY2 = 0; 
      double sumXY = 0; 

      int n = x.Count < y.Count ? x.Count : y.Count; 

      for (int i = 0; i < n; ++i) { 

       Double xval = x[i]; 
       Double yval = y[i]; 

       sumX += xval; 
       sumX2 += xval * xval; 
       sumY += yval; 
       sumY2 += yval * yval; 
       sumXY += xval * yval; 
      } 

      Double stdX = Math.Sqrt(sumX2/n - sumX * sumX/n/n); 
      Double stdY = Math.Sqrt(sumY2/n - sumY * sumY/n/n); 
      Double covariance = (sumXY/n - sumX * sumY/n/n); 

      return covariance/stdX/stdY; 
     } 

     public static double Correl2(List<double> x, List<double> y) { 

      double[] array_xy = new double[x.Count]; 
      double[] array_xp2 = new double[x.Count]; 
      double[] array_yp2 = new double[x.Count]; 

      for (int i = 0; i < x.Count; i++) 
       array_xy[i] = x[i] * y[i]; 
      for (int i = 0; i < x.Count; i++) 
       array_xp2[i] = Math.Pow(x[i], 2.0); 
      for (int i = 0; i < x.Count; i++) 
       array_yp2[i] = Math.Pow(y[i], 2.0); 
      double sum_x = 0; 
      double sum_y = 0; 
      foreach (double n in x) 
       sum_x += n; 
      foreach (double n in y) 
       sum_y += n; 
      double sum_xy = 0; 
      foreach (double n in array_xy) 
       sum_xy += n; 
      double sum_xpow2 = 0; 
      foreach (double n in array_xp2) 
       sum_xpow2 += n; 
      double sum_ypow2 = 0; 
      foreach (double n in array_yp2) 
       sum_ypow2 += n; 
      double Ex2 = Math.Pow(sum_x, 2.00); 
      double Ey2 = Math.Pow(sum_y, 2.00); 

      double Correl = 
      (x.Count * sum_xy - sum_x * sum_y)/
      Math.Sqrt((x.Count * sum_xpow2 - Ex2) * (x.Count * sum_ypow2 - Ey2)); 

      return (Correl); 
     } 
    } 
} 

数据为上述的实施例号:

287.688269702572,225.610842817282 
618.9313498167,177.955550192835 
25.7778882802361,27.6549569366756 
140.847984766051,714.618547504125 
438.618761728806,533.48764902702 
481.347431274758,214.381256273194 
21.6406916848573,393.559209519792 
135.30397563209,158.419851317732 
334.314685154853,814.275162949821 
764.614904770914,50.1435267264692 
42.8179292282173,47.8631582287434 
237.216836650491,370.488416981179 
388.849658539449,134.961087643151 
305.903013161804,441.926902444068 
10.6625048679591,369.567569480076 
36.9316453891488,24.8947204607049 
2.10067253471383,491.941975629861 
7.94887068492774,573.037801189831 
341.738006353722,653.497146697015 
98.8424873439793,475.215988045193 
272.248712629196,36.1088809138671 
122.336823399801,169.158256422336 
9.32281673202422,631.076001565473 
201.118425176068,803.724831627554 
415.514343714115,64.248651454341 
227.791637123,230.512133914284 
25.3438658925443,396.854282886188 
596.238994411304,72.543763144195 
230.239735877253,933.983901697669 
796.060099040186,689.952468971234 
9.30882684202344,269.22063744125 
16.5005430148451,8.96549091859045 
536.324005148524,358.829873788557 
519.694526420764,17.3212184707267 
552.628357889423,12.5541588051962 
210.516099897454,388.57537739937 
141.341571405689,268.082028986924 
503.880356335491,753.447006912645 
515.494990213539,444.451280259737 
973.8670776076,168.922799013985 
85.7111146094795,36.3784999169309 
37.2147129193017,108.040356312432 
504.590177939548,50.3934166889607 
482.821039277511,888.984586256083 
5.52549206350255,156.717087003271 
405.833169031345,394.099059180868 
459.249365587835,11.68776424494 
429.421127440604,314.216759666901 
126.908422469584,331.907062556551 
62.1416232716952,3.19765723645578 
4.16058817699579,604.04046284223 
484.262182311277,220.177370167886 
58.6774453314382,339.09660232677 
463.482149892246,199.181594849183 
344.128297473829,268.531428258182 
0.883430369609702,209.346384477963 
77.9462970131758,255.221325168955 
583.629439312792,235.557751925922 
358.409186083083,376.046612200349 
81.2148325150902,10.7696774717279 
53.7315618049966,274.171515094196 
111.284646992239,130.174321939319 
317.280491961763,338.077288461885 
177.454564264722,7.53587801919127 
69.2239431670047,233.693477620228 
823.419546454875,0.111916855029723 
23.7174749401014,200.989081544331 
44.9598299125022,102.633862571155 
74.1602278468945,292.485449988155 
130.11182449251,23.4682153367755 
243.088760058903,335.807090202722 
13.3974915991526,436.983231269281 
73.3900805168739,252.352352472186 
592.144630201228,92.3395205570103 
57.7306153447044,47.1416798900541 
522.649018382024,584.427794722108 
15.3662010204821,60.1693953262499 
16.8335716728277,851.401980430541 
33.9869734449251,0.930781653584345 
116.66608504982,146.126050951949 
92.8896130355492,711.765618208687 
317.91980889529,322.186540377413 
44.8574470732629,209.275617858058 
751.201537871362,37.935519233316 
161.817758424588,2.83156183493862 
531.64078452142,79.1750782491523 
114.803219681048,283.106988439852 
123.472725123853,154.125248027558 
89.9276725453919,63.4626924192825 
105.623296753328,111.234188702067 
435.72981759707,23.7058234576629 
259.324810619152,69.3535200857341 
719.885234421531,381.086239833891 
24.2674900099018,198.408173349876 
57.7761600361095,146.52277489124 
77.4594609157459,710.746080866431 
636.671781979814,538.894185951396 
56.6035279932448,58.2563265684323 
485.16099039333,427.849954283261 
91.9552873247095,576.92944263617