2010-07-28 141 views
11

对于1,000,000个观测值,我观察到一个离散事件X,对照组为3次,测试组为10次。独立性的Matlab测试

我需要在Matlab中执行卡方检验的独立性测试。这是你会怎么做,在R:

m <- rbind(c(3, 1000000-3), c(10, 1000000-10)) 
#  [,1] [,2] 
# [1,] 3 999997 
# [2,] 10 999990 
chisq.test(m) 

R函数返回卡方= 2.7692,DF = 1,p值= 0.0961。

我应该使用或创建什么Matlab函数来做到这一点?

回答

14

这是我自己的实现,我使用:

function [hNull pValue X2] = ChiSquareTest(o, alpha) 
    %# CHISQUARETEST Pearson's Chi-Square test of independence 
    %# 
    %# @param o   The Contignecy Table of the joint frequencies 
    %#      of the two events (attributes) 
    %# @param alpha  Significance level for the test 
    %# @return hNull  hNull = 1: null hypothesis accepted (independent) 
    %#      hNull = 0: null hypothesis rejected (dependent) 
    %# @return pValue The p-value of the test (the prob of obtaining 
    %#      the observed frequencies under hNull) 
    %# @return X2  The value for the chi square statistic 
    %# 

    %# o:  observed frequency 
    %# e:  expected frequency 
    %# dof: degree of freedom 

    [r c] = size(o); 
    dof = (r-1)*(c-1); 

    %# e = (count(A=ai)*count(B=bi))/N 
    e = sum(o,2)*sum(o,1)/sum(o(:)); 

    %# [ sum_r [ sum_c ((o_ij-e_ij)^2/e_ij) ] ] 
    X2 = sum(sum((o-e).^2 ./ e)); 

    %# p-value needed to reject hNull at the significance level with dof 
    pValue = 1 - chi2cdf(X2, dof); 
    hNull = (pValue > alpha); 

    %# X2 value needed to reject hNull at the significance level with dof 
    %#X2table = chi2inv(1-alpha, dof); 
    %#hNull = (X2table > X2); 

end 

并举例说明:

t = [3 999997 ; 10 999990] 
[hNull pVal X2] = ChiSquareTest(t, 0.05) 

hNull = 
    1 
pVal = 
    0.052203 
X2 = 
     3.7693 

注意,结果是从你的不同,因为chisq.test执行默认的校正,根据到?chisq.test

正确:逻辑指示是否 在计算2x2表的测试统计量时应用连续性校正 :一半是 从| O - E |差异。


另外,如果你有问题的两个事件的实际观察,你可以使用CROSSTAB函数计算列联表并返回χ2和p值的措施:

X = randi([1 2],[1000 1]); 
Y = randi([1 2],[1000 1]); 
[t X2 pVal] = crosstab(X,Y) 

t = 
    229 247 
    257 267 
X2 = 
    0.087581 
pVal = 
     0.76728 

在R中的等价物将是:

chisq.test(X, Y, correct = FALSE) 

注意:上述两种(MATLAB)方法都需要统计工具箱

+1

啊,ninja'd。代码+1! – Jonas 2010-07-28 20:07:38

+0

@Amro,你会如何为matlab实现'correct = true'? – Elpezmuerto 2010-07-28 20:29:12

+0

以及根据R文档只需从| OE |中减去一半,所以用下面的代替:'X2 = sum(sum((abs(oe)-0.5)。^ 2 ./ e));'但是你会有要手动检查此更正仅适用于2x2表格:http://en.wikipedia.org/wiki/Yates%27_correction_for_continuity – Amro 2010-07-28 20:37:33

0

此函数将使用Pearson卡方统计量和似然比统计量以及计算残差来检验独立性。我知道这可以进一步向量化,但我试图展示每一步的数学。

function independenceTest(data) 
df = (size(data,1)-1)*(size(data,2)-1); % Mean Degrees of Freedom 
sd = sqrt(2*df);      % Standard Deviation 

u   = nan(size(data)); % Estimated expected frequencies 
p   = nan(size(data)); % Values used to calculate chi-square 
lr  = nan(size(data)); % Values used to calculate likelihood-ratio 
residuals = nan(size(data)); % Residuals 

rowTotals = sum(data,1); 
colTotals = sum(data,2); 
overallTotal = sum(rowTotals); 

%% Calculate estimated expected frequencies 
for r=1:1:size(data,1) 
    for c=1:1:size(data,2) 
     u(r,c) = (rowTotals(c) * colTotals(r))/overallTotal; 
    end 
end 

%% Calculate chi-squared statistic 
for r=1:1:size(data,1) 
    for c=1:1:size(data,2) 
     p(r,c) = (data(r,c) - u(r,c))^2/u(r,c); 
    end 
end 
chi = sum(sum(p)); % Chi-square statistic 

%% Calculate likelihood-ratio statistic 
for r=1:1:size(data,1) 
    for c=1:1:size(data,2) 
     lr(r,c) = data(r,c) * log(data(r,c)/u(r,c)); 
    end 
end 
G = 2 * sum(sum(lr)); % Likelihood-Ratio statisitc 

%% Calculate residuals 
for r=1:1:size(data,1) 
    for c=1:1:size(data,2) 
     numerator = data(r,c) - u(r,c); 
     denominator = sqrt(u(r,c) * (1 - colTotals(r)/overallTotal) * (1 - rowTotals(c)/overallTotal)); 
     residuals(r,c) = numerator/denominator; 
    end 
end 
+0

查看@ Amro的代码。他没有循环进行相同的计算,因此更简洁。 – Jonas 2010-07-28 20:19:57