I have a dataframe with finance data (33023 rows, here the link to the data: https://mab.to/Ssy3TelRs); df.open is the price of the title and df.close is the closing price.
I have been trying to see how many times in a row the title closed with a gain and with a lost.
The result that I'm looking for should tell me that the title was positive 2 days in a row x times, 3 days in a row y times, 4 days in a row z times and so forth.
I have started with a for:
for x in range(1,df.close.count()): y = df.close[x]-df.open[x]
and then unsuccessful series of if statements...
Thank you for your help.
CronosVirus00
EDITS:
>>> df.head(7) data ora open max min close Unnamed: 6 0 20160801 0 1.11781 1.11781 1.11772 1.11773 0 1 20160801 100 1.11774 1.11779 1.11773 1.11777 0 2 20160801 200 1.11779 1.11800 1.11779 1.11795 0 3 20160801 300 1.11794 1.11801 1.11771 1.11771 0 4 20160801 400 1.11766 1.11772 1.11763 1.11772 0 5 20160801 500 1.11774 1.11798 1.11774 1.11796 0 6 20160801 600 1.11796 1.11796 1.11783 1.11783 0
Ifs:
for x in range(1,df.close.count()): y = df.close[x]-df.open[x] if y > 0 : green += 1 y = df.close[x+1] - df.close[x+1] twotimes += 1 if y > 0 : green += 1 y = df.close[x+2] -
df.close[x+2] threetimes += 1 if y > 0 : green += 1 y = df.close[x+3] - df.close[x+3] fourtimes += 1
FINAL SOLUTION
Thank you all! And the end I did this:
df['test'] = df.close - df.open >0 green = df.test #days that it was positive def gg(z): tot =green.count() giorni = range (1,z+1) # days in a row i wanna check for x in giorni: y = (green.rolling(x).sum()>x-1).sum() print(x," ",y, " ", round((y/tot)*100,1),"%") gg(5) 1 14850 45.0 % 2 6647 20.1 % 3 2980 9.0 % 4 1346 4.1 % 5 607 1.8 %
回答
如果我理解正确你的问题,你可以这样来做:
In [76]: df.groupby((df.close.diff() < 0).cumsum()).cumcount()
Out[76]:
0 0
1 1
2 2
3 0
4 1
5 2
6 0
7 0
dtype: int64
The result that I'm looking for should tell me that the title was positive 2 days in a row x times, 3 days in a row y times, 4 days in a row z times and so forth.
In [114]: df.groupby((df.close.diff() < 0).cumsum()).cumcount().value_counts().to_frame('count')
Out[114]:
count
0 4
2 2
1 2
数据集:
In [78]: df
Out[78]:
data ora open max min close
0 20160801 0 1.11781 1.11781 1.11772 1.11773
1 20160801 100 1.11774 1.11779 1.11773 1.11777
2 20160801 200 1.11779 1.11800 1.11779 1.11795
3 20160801 300 1.11794 1.11801 1.11771 1.11771
4 20160801 400 1.11766 1.11772 1.11763 1.11772
5 20160801 500 1.11774 1.11798 1.11774 1.11796
6 20160801 600 1.11796 1.11796 1.11783 1.11783
7 20160801 700 1.11783 1.11799 1.11783 1.11780
In [80]: df.close.diff()
Out[80]:
0 NaN
1 0.00004
2 0.00018
3 -0.00024
4 0.00001
5 0.00024
6 -0.00013
7 -0.00003
Name: close, dtype: float64
它的工作!谢谢 – CronosVirus00
这听起来像你想要做的是:
- 计算两个系列的差异(打开&关闭),例如
diff = df.open - df.close
- 对结果应用条件以获得布尔系列
diff > 0
- 所产生的布尔系列传递给数据框来获取数据框的子集,其中的条件为真
df[diff > 0]
- 查找应用逐列函数的所有连续子序列,以识别和计数
我需要登一架飞机,但我会提供一个最后一步看起来像什么时候的样本。
你的第一个3分是现货!现在我需要弄清楚如何做你的第四个建议。保持更新 – CronosVirus00
如果我正确地理解了你,你想要的天数至少包含n
之前的连续正数天。
同样什么@Thang建议,您可以使用rolling:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10, 2), columns=["open", "close"])
# This just sets up random test data, for example:
# open close
# 0 0.997986 0.594789
# 1 0.052712 0.401275
# 2 0.895179 0.842259
# 3 0.747268 0.919169
# 4 0.113408 0.253440
# 5 0.199062 0.399003
# 6 0.436424 0.514781
# 7 0.180154 0.235816
# 8 0.750042 0.558278
# 9 0.840404 0.139869
positiveDays = df["close"]-df["open"] > 0
# This will give you a series that is True for positive days:
# 0 False
# 1 True
# 2 False
# 3 True
# 4 True
# 5 True
# 6 True
# 7 True
# 8 False
# 9 False
# dtype: bool
daysToCheck = 3
positiveDays.rolling(daysToCheck).sum()>daysToCheck-1
现在,这会给你一个系列,这表明每一天,无论是积极的daysToCheck
数量连续天数:现在
0 False
1 False
2 False
3 False
4 False
5 True
6 True
7 True
8 False
9 False
dtype: bool
可以使用(positiveDays.rolling(daysToCheck).sum()>daysToCheck-1).sum()
得到的天数(在本例中3
)服从这一点,这是你想要的,据我了解。
我现在正在更新熊猫,因为可以从0.18版本(我有0.17)中进行滚动。我会让你知道它是否有效。 – CronosVirus00
作品!谢谢 – CronosVirus00
这应该工作:
import pandas as pd
import numpy as np
test = pd.DataFrame(np.random.randn(100,2), columns = ['open','close'])
test['gain?'] = (test['open']-test['close'] < 0)
test['cumulative'] = 0
for i in test.index[1:]:
if test['gain?'][i]:
test['cumulative'][i] = test['cumulative'][i-1] + 1
test['cumulative'][i-1] = 0
results = test['cumulative'].value_counts()
忽略了 '0' 行中的结果。如果你想要将两天的运行时间同时计算为一次运行,那么可以修改它,而不会有太多麻烦。
编辑:无警告 -
import pandas as pd
import numpy as np
test = pd.DataFrame(np.random.randn(100,2), columns = ['open','close'])
test['gain?'] = (test['open']-test['close'] < 0)
test['cumulative'] = 0
for i in test.index[1:]:
if test['gain?'][i]:
test.loc[i,'cumulative'] = test.loc[i-1,'cumulative'] + 1
test.loc[i-1,'cumulative'] = 0
results = test['cumulative'].value_counts()
它给了我这个错误: test ['cumulative'] [i] = test ['cumulative'] [i-1] + 1 SettingWithCopyWarning: 正试图在片的副本上设置一个值从DataFrame – CronosVirus00
我不/认为/警告有什么区别?但我已经编辑删除它。 –
是的,你是对的,它的工作原理。谢谢 – CronosVirus00
- 1. 第二部分:连续计数多少时间的总和的结果为正(或负)
- 2. 数连续正数或负数值
- 3. 累计总和的连续负或正值
- 4. 计算有多少连续值为真
- 5. 计数崩溃连续结果
- 6. 当2个或更多连续值为负数时,向上添加负数
- 7. Google表格Javascript - 连续计数负数
- 8. 在C++中计数连续的次数?
- 9. 3个或更少的连续数 - RegEx
- 10. 根据匹配情况,PHP preg_replace的连续次数是多少?
- 11. 计算连续出现多少个数值的因子
- 12. 计算连续有多少个数字具有相同的值
- 13. 得到连续输多赢结果
- 14. 一列中连续出现一次值多少次 - excel
- 15. 组,计数和连续圈
- 16. 连续特定的字符零次或多次红宝石正则表达式
- 17. 为什么连续多次调用NSMetadataQueryDidUpdateNotification?
- 18. 没有连续或连续数字的正则表达式
- 19. 如果计算结果为负,则输出0,否则计算
- 20. 计算向量中正向和负向运行的次数
- 21. 在熊猫中,我如何计算连续的正面和负面?
- 22. 在MySQL中计算连续或连续数字
- 23. 读取1和0的字符串。计数连续1的数目和连续0的数目,直到结束
- 24. R:找到三个或更多的连续负数,并从数据帧
- 25. 连续重组结果
- 26. 连续记录结果
- 27. 卡角连续结果
- 28. 如何连续运行多次函数?
- 29. jQuery或Javascript连续计数器(countup)
- 30. 代表和/或seq函数创建连续减少向量?
请包括您的不成功的if语句。另外,python依赖于缩进,因此,请确保您的代码格式与*代码中的*完全相同。 – dckuehn
你是否希望连续至少有n个积极日子的天数和本身包含在内,或连续数至少有'n'个积极日子的天数? – jotasi
你还可以提供所需的数据集/ DF? – MaxU