2016-10-28 38 views
1

我是熊猫图书馆的新手,需要一些帮助。我有两列这样的:使用Python进行数据分析熊猫

Test Result  Risk Rating 
    Fail    Low     
    Pass    Medium 
    Skip    High 
    Pass    Low     
    Fail    Medium 
    Pass    High 
    Skip    Low     
    Fail    Medium 
    Fail    High 

现在,我需要找到不合格,合格的总数,并从“测试结果”栏略过,我能够做到这一点。但是,我还需要从“风险评级”列中将“测试结果”列的“失败”总数与“低”进行比较。同样,总数失败与中等等。我的最终结果应该如下所示:

Fail (Low Risk Rating) = 1 
Fail (Medium Risk Rating) = 2 
Fail (High Risk Rating) = 1 
Pass (Low Risk Rating) = 1 
Pass (Medium Risk Rating) = 1 
Pass (High Risk Rating) = 1 
Skip (Low Risk Rating) = 1 
Skip (Medium Risk Rating) = 0 
Skip (High Risk Rating) = 1 

我该怎么做?任何帮助,将不胜感激。

回答

3

我想你需要groupby由两列和汇总size

df = df.groupby(['Test Result', 'Risk Rating']).size().reset_index(name='counts') 
print (df) 
    Test Result Risk Rating counts 
0  Fail  High  1 
1  Fail   Low  1 
2  Fail  Medium  2 
3  Pass  High  1 
4  Pass   Low  1 
5  Pass  Medium  1 
6  Skip  High  1 
7  Skip   Low  1 

也许更好的数据透视表与unstack

df = df.groupby(['Test Result', 'Risk Rating']).size().unstack(fill_value=0) 
print (df) 
Risk Rating High Low Medium 
Test Result     
Fail   1 1  2 
Pass   1 1  1 
Skip   1 1  0 

或者slowier溶液crosstab

df = pd.crosstab(df['Test Result'], df['Risk Rating']) 
print (df) 
Risk Rating High Low Medium 
Test Result     
Fail   1 1  2 
Pass   1 1  1 
Skip   1 1  0 

如果需要mi与0 ssing值添加stack:。

df = df.groupby(['Test Result', 'Risk Rating']) 
     .size() 
     .unstack(fill_value=0) 
     .stack() 
     .reset_index(name='counts') 
print (df) 
    Test Result Risk Rating counts 
0  Fail  High  1 
1  Fail   Low  1 
2  Fail  Medium  2 
3  Pass  High  1 
4  Pass   Low  1 
5  Pass  Medium  1 
6  Skip  High  1 
7  Skip   Low  1 
8  Skip  Medium  0 
+0

thanks..I正在使用DF = df.groupby([ '测试结果', '风险评级'])尺寸()出栈(fill_value = 0),但不能够从df的结果中获得特定的值。例如。我只需要'高','低','中'值的'失败'值。 –

+0

我认为你需要['布尔索引'](http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing) – jezrael