2017-02-14 90 views
2

我想在下表转置:移调熊猫数据帧到数据范围

Name | State | Value ~~~~~~~~~~~~~~~~~~~~ nameA | state1 | 1 nameA | state2 | 5 nameA | state1 | 9 nameA | state1 | 2 nameB | state2 | 3 nameB | state1 | 1

成像这样的表:

Name | range1_state1 |range1_state2 | range2_state1 | range2_state2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ nameA | 2 | 1 | 0 | 1 nameB | 1 | 0 | 1 | 0

如果范围1 = [0,5)并且range2 =(5,10]
并且第二个表中的数据是
第一个表中的出现次数。

谢谢

+0

[欢迎](http://stackoverflow.com/tour)堆栈溢出。任何代码尝试?它使得回答更容易理解。参见[如何提出一个好问题](http://stackoverflow.com/help/how-to-ask) – Irfan

回答

2

我认为你需要cut的垃圾箱与crosstab

print (pd.cut(df['Value'], bins=[0, 5, 10], include_lowest=True)) 
0  [0, 5] 
1  [0, 5] 
2 (5, 10] 
3  [0, 5] 
4  [0, 5] 
5  [0, 5] 
Name: Value, dtype: category 
Categories (2, object): [[0, 5] < (5, 10]] 

df['rng'] = pd.cut(df['Value'], bins=[0, 5, 10], 
        labels=['range1','range2'], include_lowest=True) 
df['State'] = df['rng'].astype(str) + '_' + df['State'] 
print (df) 
    Name   State Value  rng 
0 nameA range1_state1  1 range1 
1 nameA range1_state2  5 range1 
2 nameA range2_state1  9 range2 
3 nameA range1_state1  2 range1 
4 nameB range1_state2  3 range1 
5 nameB range1_state1  1 range1 

df = pd.crosstab(df.Name, df.State) 
print (df) 
State range1_state1 range1_state2 range2_state1 
Name            
nameA    2    1    1 
nameB    1    1    0 

编辑:

您可以检查值,其中在此示例中是分档:

df1 = pd.DataFrame({'Value':np.arange(11)}) 
df1['bins'] = pd.cut(df1['Value'], bins=[0, 5, 10], include_lowest=True) 
print (df1) 
    Value  bins 
0  0 [0, 5] 
1  1 [0, 5] 
2  2 [0, 5] 
3  3 [0, 5] 
4  4 [0, 5] 
5  5 [0, 5] 
6  6 (5, 10] 
7  7 (5, 10] 
8  8 (5, 10] 
9  9 (5, 10] 
10  10 (5, 10] 
+0

非常感谢!这种方法将我的执行时间从6秒改为0.08秒:D – Stefan

+0

是的,'cut'确实很快。感谢您的接受! – jezrael