以连续值组切片熊猫数据帧

我有一个数据框，其中包含最终“跳过”（即增加超过1）的连续值部分。我想数据框，类似拆分groupby功能（字母索引作秀）：以连续值组切片熊猫数据帧

A 
a 1 
b 2 
c 3 
d 6 
e 7 
f 8 
g 11 
h 12 
i 13 

# would return 

a 1 
b 2 
c 3 
----- 
d 6 
e 7 
f 8 
----- 
g 11 
h 12 
i 13

来源

2014-09-30 heltonbiker

速度答案稍有改善......

for k,g in df.groupby(df['A'] - np.arange(df.shape[0])): 
    print g

来源

2014-09-30 14:58:03 ZJS

非常非常聪明的...谢谢 – heltonbiker 2014-09-30 16:12:41

我们可以使用shift来比较，如果行之间的差异大于1，然后构造元组对的列表所需要的指标：

In [128]: 
# list comprehension of the indices where the value difference is larger than 1, have to add the first row index also 
index_list = [df.iloc[0].name] + list(df[(df.value - df.value.shift()) > 1].index) 
index_list 
Out[128]: 
['a', 'd', 'g']

我们要构建一个我们感兴趣的范围内的元组对的列表，请注意，在大熊猫包括在BEG和结束索引值，所以我们必须找到标签前一行为结束范围标签：

In [170]: 

final_range=[] 
for i in range(len(index_list)): 
    # handle last range value 
    if i == len(index_list) -1: 
     final_range.append((index_list[i], df.iloc[-1].name)) 
    else: 
     final_range.append((index_list[i], df.iloc[ np.searchsorted(df.index, df.loc[index_list[i + 1]].name) -1].name)) 

final_range 

Out[170]: 
[('a', 'c'), ('d', 'f'), ('g', 'i')]

我使用numpy的真实searchsorted找到索引值（整数为主），我们可以从这个插入我们的价值，然后减去1获得以前行的索引标签值

In [171]: 
# now print 
for r in final_range: 
    print(df[r[0]:r[1]]) 
     value 
index  
a   1 
b   2 
c   3 
     value 
index  
d   6 
e   7 
f   8 
     value 
index  
g   11 
h   12 
i   13

来源

2014-09-30 13:21:09 EdChum

我的两分钱只是它的乐趣。

In [15]: 

for grp, val in df.groupby((df.diff()-1).fillna(0).cumsum().A): 
    print val 
    A 
a 1 
b 2 
c 3 
    A 
d 6 
e 7 
f 8 
    A 
g 11 
h 12 
i 13

来源

2014-09-30 14:52:45

以连续值组切片熊猫数据帧

回答

相关问题