2017-08-14 86 views
0

this solution如何使用pandas/python来实现?这个问题涉及使用此stats.stackexchange solution围绕平均值找到95%CI的实现。Python:实现平均值意味着95%置信区间?

import pandas as pd 
from IPython.display import display 
import scipy 
import scipy.stats as st 
import scikits.bootstrap as bootstraps 

data = pd.DataFrame({ 
    "exp1":[34, 41, 39] 
    ,"exp2":[45, 51, 52] 
    ,"exp3":[29, 31, 35] 
}).T 

data.loc[:,"row_mean"] = data.mean(axis=1) 
data.loc[:,"row_std"] = data.std(axis=1) 
display(data) 

<table border="1" class="dataframe"> <thead> <tr style="text-align: right;">  <th></th>  <th>0</th>  <th>1</th>  <th>2</th>  <th>row_mean</th>  <th>row_std</th> </tr> </thead> <tbody> <tr>  <th>exp1</th>  <td>34</td>  <td>41</td>  <td>39</td>  <td>38.000000</td>  <td>2.943920</td> </tr> <tr>  <th>exp2</th>  <td>45</td>  <td>51</td>  <td>52</td>  <td>49.333333</td>  <td>3.091206</td> </tr> <tr>  <th>exp3</th>  <td>29</td>  <td>31</td>  <td>35</td>  <td>31.666667</td>  <td>2.494438</td> </tr> 
 
</tbody> </table>

mean_of_means = data.row_mean.mean() 
std_of_means = data.row_mean.std() 
confidence = 0.95 
print("mean(means): {}\nstd(means):{}".format(mean_of_means,std_of_means)) 
  • 平均值(装置):39.66666666666667
  • STD(装置):8.950481054731702

第一不正确尝试(zscore):

zscore = st.norm.ppf(1-(1-confidence)/2) 
lower_bound = mean_of_means - (zscore*std_of_means) 
upper_bound = mean_of_means + (zscore*std_of_means) 
print("95% CI = [{},{}]".format(lower_bound,upper_bound)) 
  • 95%CI = [22.1,57.2](不正确溶液)

第二不正确尝试(tscore):

tscore = st.t.ppf(1-0.05, data.shape[0]) 
lower_bound = mean_of_means - (tscore*std_of_means) 
upper_bound = mean_of_means + (tscore*std_of_means) 
print("95% CI = [{},{}]".format(lower_bound,upper_bound)) 
  • 95%CI = [18.60,60.73](不正确溶液)

第三不正确尝试(自举):

CIs = bootstraps.ci(data=data.row_mean, statfunction=scipy.mean,alpha=0.05) 
  • 95%CI = [31.67,49.33(不正确解决方案)

this solution如何使用pan das/python在下面得到正确的解决方案?

  • 95%CI = [17.4 61.9](正确溶液)
+0

也许'scikits-bootstrap'你想要做什么? – xaav

+0

@xaav,刚刚添加了一个使用这个建议的例子,很遗憾,我没有提供正确的解决方案,尽管我可能会错误地使用它。我不确定alpha是否应该设置为0.05或0.025,但无论如何,这是不正确的。 – blehman

回答

0

谢谢乔恩贝茨。

import pandas as pd 
import scipy 
import scipy.stats as st 

data = pd.DataFrame({ 
    "exp1":[34, 41, 39] 
    ,"exp2":[45, 51, 52] 
    ,"exp3":[29, 31, 35] 
}).T 

data.loc[:,"row_mean"] = data.mean(axis=1) 
data.loc[:,"row_std"] = data.std(axis=1) 

tscore = st.t.ppf(1-0.025, data.shape[0]-1) 

print("mean(means): {}\nstd(means): {}\ntscore: {}".format(mean_of_means,std_of_means,tscore)) 

lower_bound = mean_of_means - (tscore*std_of_means/(data.shape[0]**0.5)) 
upper_bound = mean_of_means + (tscore*std_of_means/(data.shape[0]**0.5)) 

print("95% CI = [{},{}]".format(lower_bound,upper_bound)) 

平均值(手段):39.66666666666667
STD(手段):8.950481054731702
tscore:4.302652729911275
95%CI = [17.432439139464606,61.90089419386874]