我有一个dataframe，我想按分类变量和一系列值进行分组。你可以把它想成类似值的行（簇？）。例如：如何根据熊猫的一系列值进行分组？

df = pd.DataFrame({'symbol' : ['IP', 'IP', 'IP', 'IP', 'IP', 'IP', 'IP'], 
        'serie' : ['A', 'B', 'A', 'B', 'A', 'B', 'B'], 
        'strike' : [10, 10, 12, 13, 12, 13, 14], 
        'last' : [1, 2, 2.5, 3, 4.5, 5, 6], 
        'price' : [11, 11, 11, 11, 11, 11, 11], 
        'type' : ['call', 'put', 'put', 'put', 'call', 'put', 'call']})

如果我使用

grouped = df.groupby(['symbol', 'serie', 'strike'])

我已经解决了我的问题的一部分，但我想罢工值更接近，如10和11，12和13等相结合向前。最好在％范围内。

来源

2016-03-20 rmantovani

似乎复制：http://stackoverflow.com/questions/21441259/pandas-groupby-值范围 –

您是否可以显示预期的输出？ –

您需要一个明确定义的标准，以首先对罢工值进行聚类/分组。 – Goyo

我在猜测OP想要按分类变量进行分组，然后按照区间进行分组。在这种情况下，您可以使用np.digitize()。

smallest = np.min(df['strike']) 
largest = np.max(df['strike']) 
num_edges = 3 
# np.digitize(input_array, bin_edges) 
ind = np.digitize(df['strike'], np.linspace(smallest, largest, num_edges))

所有你想要的栏目，然后ind应是对应装箱

[10, 10, 12, 13, 12, 13, 14]

与仓

array([1, 1, 2, 2, 2, 2, 3], dtype=int64)

边缘

array([ 10., 12., 14.]) # == np.linspace(smallest, largest, num_edges)

最后，集团，但同这个额外斌列

df['binned_strike'] = ind 
for grp in df.groupby(['symbol', 'serie', 'binned_strike']): 
    print "group key" 
    print grp[0] 
    print "group content" 
    print grp[1] 
    print "============="

这应该打印

group key 
('IP', 'A', 1) 
group content 
    last price serie strike symbol type binned_strike 
0 1.0  11  A  10  IP call    1 
============= 
group key 
('IP', 'A', 2) 
group content 
    last price serie strike symbol type binned_strike 
2 2.5  11  A  12  IP put    2 
4 4.5  11  A  12  IP call    2 
============= 
group key 
('IP', 'B', 1) 
group content 
    last price serie strike symbol type binned_strike 
1 2.0  11  B  10  IP put    1 
============= 
group key 
('IP', 'B', 2) 
group content 
    last price serie strike symbol type binned_strike 
3 3.0  11  B  13  IP put    2 
5 5.0  11  B  13  IP put    2 
============= 
group key 
('IP', 'B', 3) 
group content 
    last price serie strike symbol type binned_strike 
6 6.0  11  B  14  IP call    3 
=============

来源

2016-03-20 19:49:06 Mai

做`groupy()`上箱`strike`

打击数据的创建箱与pd.cut，然后组由信息：

# Create DataFrame 
df = pd.DataFrame({ 
    'symbol' : ['IP', 'IP', 'IP', 'IP', 'IP', 'IP', 'IP'], 
    'serie' : ['A', 'B', 'A', 'B', 'A', 'B', 'B'], 
    'strike' : [10, 10, 12, 13, 12, 13, 14], 
    'last' : [1, 2, 2.5, 3, 4.5, 5, 6], 
    'price' : [11, 11, 11, 11, 11, 11, 11], 
    'type' : ['call', 'put', 'put', 'put', 'call', 'put', 'call'] 
}) 
# Create Bins (example three bins across data) 
df['strikebins'] = pd.cut(df['strike'], bins=3) 

print 'Binned DataFrame:' 
print df 
print 

# Group these DataFrame 
grouped = df.groupby(['symbol', 'serie', 'strikebins']) 

# Do something with groups for example 
gp_sum = grouped.sum() 

print 'Grouped Sum (for example):' 
print gp_sum 
print

Binned DataFrame: 
    last price serie strike symbol type  strikebins 
0 1.0  11  A  10  IP call (9.996, 11.333] 
1 2.0  11  B  10  IP put (9.996, 11.333] 
2 2.5  11  A  12  IP put (11.333, 12.667] 
3 3.0  11  B  13  IP put  (12.667, 14] 
4 4.5  11  A  12  IP call (11.333, 12.667] 
5 5.0  11  B  13  IP put  (12.667, 14] 
6 6.0  11  B  14  IP call  (12.667, 14] 

Grouped Sum (for example): 
           last price strike 
symbol serie strikebins       
IP  A  (9.996, 11.333]  1  11  10 
      (11.333, 12.667]  7  22  24 
      (12.667, 14]  NaN NaN  NaN 
     B  (9.996, 11.333]  2  11  10 
      (11.333, 12.667] NaN NaN  NaN 
      (12.667, 14]  14  33  40

你可以drop()strike如果你想，或者与范围的平均值代替strikebins ...

来源

2016-03-20 20:00:00 tmthydvnprt

如何根据熊猫的一系列值进行分组？

回答

做groupy()上箱strike

相关问题

做`groupy()`上箱`strike`