2017-05-07 53 views
1

我有一个调查数据集,所有问题的7点标度,我想获得所有列的公用值的value_counts(并将数据框分成两列)。让我向您展示一个示例数据集,以及我到目前为止所达到的位置。Python - Pandas - value_counts所有列在一个分组的数据框

| col1   | col2   | col3   | Building  | Levels_Name   | 
|---------------|---------------|---------------|---------------|------------------------| 
| Not Satisfied | Not Satisfied | Not Satisfied | San Francisco | Individual Contributor | 
| Satisfied  | Satisfied  | NA   | Basingstoke | Individual Contributor | 
| Not Satisfied | Satisfied  | Not Satisfied | San Francisco | Middle Management  | 
| Not Satisfied | Satisfied  | Not Satisfied | Miami   | Senior Leadership  | 
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City | Senior Leadership  | 
| NA   | NA   | NA   | Foster City | Other     | 
| Not Satisfied | Not Satisfied | NA   | Foster City | Senior Leadership  | 
| Not Satisfied | Satisfied  | Not Satisfied | Austin  | Middle Management  | 
| Satisfied  | Satisfied  | Satisfied  | San Francisco | Senior Leadership  | 
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City | Individual Contributor | 
| Satisfied  | Satisfied  | NA   | Miami   | Middle Management  | 

现在,我希望将这个数据由“建设”和“Levels_Name”设置,并添加一个新的分组为“满意”,“不满意”,“NA”,并得到各列的值计数。

所以结果应该如下所示:

| Building  | Levels_Name   | Sentiment  | col1 | col2 | col3 | 
|---------------|------------------------|---------------|------|------|------| 
| Foster City | Individual Contributor | Not Satisfied | 1 | 1 | 1 | 
| Foster City | Individual Contributor | NA   | 0 | 0 | 0 | 
| Foster City | Individual Contributor | Satisfied  | 0 | 0 | 0 | 
| Foster City | Senior Leadership  | Not Satisfied | 2 | 2 | 0 | 
| Foster City | Senior Leadership  | NA   | 0 | 0 | 1 | 
| Foster City | Senior Leadership  | Satisfied  | 0 | 0 | 0 | 
| San Francisco | Individual Contributor | Not Satisfied | 1 | 1 | 1 | 
| San Francisco | Individual Contributor | NA   | 0 | 0 | 0 | 
| San Francisco | Individual Contributor | Satisfied  | 0 | 0 | 0 | 

谢谢!

回答

1

首先,你要融化的数据帧,然后通过

d1 = pd.melt(
    df, ['Building', 'Levels_Name'], value_name='Sentiment' 
).replace(np.nan, 'NaN') 

d1.groupby(
    d1.columns.tolist() 
).size().unstack('variable', fill_value=0).reset_index() 

variable  Building    Levels_Name  Sentiment col1 col2 col3 
0    Austin  Middle Management Not Satisfied  1  0  1 
1    Austin  Middle Management  Satisfied  0  1  0 
2   Basingstoke Individual Contributor   NaN  0  0  1 
3   Basingstoke Individual Contributor  Satisfied  1  1  0 
4   Foster City Individual Contributor Not Satisfied  1  1  1 
5   Foster City     Other   NaN  1  1  1 
6   Foster City  Senior Leadership   NaN  0  0  1 
7   Foster City  Senior Leadership Not Satisfied  2  2  1 
8     Miami  Middle Management   NaN  0  0  1 
9     Miami  Middle Management  Satisfied  1  1  0 
10    Miami  Senior Leadership Not Satisfied  1  0  1 
11    Miami  Senior Leadership  Satisfied  0  1  0 
12  San Francisco Individual Contributor Not Satisfied  1  1  1 
13  San Francisco  Middle Management Not Satisfied  1  0  1 
14  San Francisco  Middle Management  Satisfied  0  1  0 
15  San Francisco  Senior Leadership  Satisfied  1  1  1 
+0

真棒做一团!太棒了!感谢您的帮助! :D超感谢。 – NinjaElvis

相关问题