2016-01-10 84 views
1

我有一个这样的大数据集,我试图做一个字典的数据框的字典组织犯罪与其他列的频率。创建一个词典的频率字典从数据帧

train_data

23 Wednesday BAYVIEW CENTRAL INGLESIDE NORTHERN PARK RICHMOND crime 
0 1   1  0  0   0   1  0   0  3 
1 1   1  0  0   0   1  0   0  1 
2 1   1  0  0   0   1  0   0  1 
3 1   1  0  0   0   1  0   0  0 
4 1   1  0  0   0   0  1   0  0 
5 1   1  0  0   1   0  0   0  0 
6 1   1  0  0   1   0  0   0  2 
7 1   1  1  0   0   0  0   0  2 
8 1   1  0  0   0   0  0   1  0 
9 1   1  0  1   0   0  0   0  0 

所以我决定首先用“罪行”的列GROUPBY数据框:

train_data=train_data.groupby(['crime']).sum() 


     23 Wednesday BAYVIEW CENTRAL INGLESIDE NORTHERN PARK RICHMOND 
crime                  
0  5   5  0  1   1   1  1   1 
1  2   2  0  0   0   2  0   0 
2  2   2  1  0   1   0  0   0 
3  1   1  0  0   0   1  0   0 

然后我试图组织他们在词典的词典,但我无法做到这一点,我尝试了一些迭代,但数据框有问题。

结果应该是这样的:

{0: {23: 5, Wednesday: 1, BAYVIEW: 0, CENTRAL: 1, ...}, 
1: {23: 2, Wednesday: 2, BAYVIEW: 0, ...}, 
2: {...}, 3: {...}} 

回答

0

如果你对大熊猫0.17.0或更新版本或更高版本的MaxNoe发布:

train_data.groupby('crime').sum().to_dict(orient='index') 

否则:

train_data.groupby('crime').sum().T.to_dict()