在数据帧

测试后续值我有正，负整数一列的数据帧。对于每一行，我想看看有多少连续行（从当前行开始并包括当前行）具有负值。在数据帧

所以，如果一个序列是2, -1, -3, 1, -1，其结果将是0, 2, 1, 0, 1。

我可以通过遍历所有索引，使用.iloc拆分列和next()来找出下一个正值的位置。但我觉得这不是利用熊猫的能力，我想有更好的方法来做到这一点。我已经尝试使用.shift()和expanding_window但没有成功。

有没有找出多少个连续行当前的后遇到一些逻辑条件更加“pandastic”的方式？

这里是什么工作现在：

import pandas as pd 

df = pd.DataFrame({"a": [2, -1, -3, -1, 1, 1, -1, 1, -1]}) 

df["b"] = 0 
for i in df.index: 
    sub = df.iloc[i:].a.tolist() 
    df.b.iloc[i] = next((sub.index(n) for n in sub if n >= 0), 1)

编辑：我意识到，当有在年底超过一个负值，即使我自己的例子不工作。因此，更好的解决方案更加必要。

编辑2：我说这个问题在整数的条件，但最初只放1和-1在我的例子。一般来说，我需要解决积极和消极的整数。

来源

2015-04-07 ASGM

FWIW，这是一个相当有条理的答案，不需要功能或适用。从here（其他的答案中我敢肯定），并感谢借用到@DSM的提升= False选项：

df = pd.DataFrame({"a": [2, -1, -3, -1, 1, 1, -1, 1, -1, -2]}) 

df['pos'] = df.a > 0 
df['grp'] = (df['pos'] != df['pos'].shift()).cumsum() 
dfg = df.groupby('grp') 
df['c'] = np.where(df['a'] < 0, dfg.cumcount(ascending=False)+1, 0) 

    a b pos grp c 
0 2 0 True 1 0 
1 -1 3 False 2 3 
2 -3 2 False 2 2 
3 -1 1 False 2 1 
4 1 0 True 3 0 
5 1 0 True 3 0 
6 -1 1 False 4 1 
7 1 0 True 5 0 
8 -1 1 False 6 2 
9 -2 1 False 6 1

我觉得这个方法的好处是，一旦你建立了“玻璃钢”变量，你可以用标准的groupby方法很容易地做很多事情。

来源

2015-04-07 19:53:51 JohnE

这与我正要写的内容更接近，但是您可以通过执行诸如'cumcount（ascending = False）+ 1'之类的操作来简化。尽管如此，我也懒得去检查边缘案例。 :-) – DSM

@DSM谢谢，做出了改变。更简单，快得多。 – JohnE

当DataFrame只包含'1'和'-1'时，它的工作效果很好，但是当它们采用其他值时它不会工作。这个错误是我的，因为我混淆了我的问题 - 我用整数来描述我的问题，但在这个例子中我只放了'1'和'-1'。（不过，我仍然赞成，因为它解决了这个例子）。 – ASGM

这是一个有趣的难题。我找到了一种使用熊猫工具的方法，但我认为你会同意它更加不透明:-)。这里的例子：

data = pandas.Series([1, -1, -1, -1, 1, -1, -1, 1, 1, -1, 1]) 
x = data[::-1] # reverse the data 

print(x.groupby(((x<0) != (x<0).shift()).cumsum()).apply(lambda x: pandas.Series(
    np.arange(len(x))+1 if (x<0).all() else np.zeros(len(x)), 
    index=x.index))[::-1])

输出是正确的：

0  0 
1  3 
2  2 
3  1 
4  0 
5  2 
6  1 
7  0 
8  0 
9  1 
10 0 
dtype: float64

的基本思路是相似，我在我的答案描述this question，你可以找到不同的答案，使用相同的办法，请问如何在大熊猫中使用行间信息。你的问题稍微棘手，因为你的标准去反向（询问以下底片，而不是前底片数量的数），因为你只需要分组的一侧（即，您只需要数连续的负数，而不是具有相同符号的连续数字的数量）。

下面是相同的代码更详细的版本有一些解释，这可能使它更容易掌握：

def getNegativeCounts(x): # This function takes as input a sequence of numbers, all the same sign. # If they're negative, it returns an increasing count of how many there are. # If they're positive, it just returns the same number of zeros. # [-1, -2, -3] -> [1, 2, 3] # [1, 2, 3] -> [0, 0, 0] if (x<0).all(): return pandas.Series(np.arange(len(x))+1, index=x.index) else: return pandas.Series(np.zeros(len(x)), index=x.index) # we have to reverse the data because cumsum only works in the forward direction x = data[::-1] # compute for each number whether it has the same sign as the previous one sameSignAsPrevious = (x<0) != (x<0).shift() # cumsum this to get an "ID" for each block of consecutive same-sign numbers sameSignBlocks = sameSignAsPrevious.cumsum() # group on these block IDs g = x.groupby(sameSignBlocks) # for each block, apply getNegativeCounts # this will either give us the running total of negatives in the block, # or a stretch of zeros if the block was positive # the [::-1] at the end reverses the result # (to compensate for our reversing the data initially) g.apply(getNegativeCounts)[::-1]

正如你所看到的，运行长度样式的操作中通常不大熊猫简单。然而，an open issue增加了更多的分组/分区功能，可以改善其中的一些功能。在任何情况下，你的特定用例都有一些特殊的怪癖，它与典型的游程任务有点不同。

来源

2015-04-07 19:36:44 BrenBarn

这两个答案都非常有帮助。我特别感谢你给出的详细解释。我很难接受一个，但决定选择@ JohnE，因为解决方案更简单一些。但如果可以的话，我会选择。 – ASGM

回答

相关问题