2017-02-17 53 views
0

我创建了一个熊猫df,我从雅虎带来股票数据。我增加了一个百分比变化列和筛选的df,其中百分比变化> 0.02。这里没有问题。 现在我想添加一个额外的选择参数,该参数将输出一个df,其中我可以查看前一个条件为True(pct_change> 0.02)视图的日期,该日期在评估条件之前10天和之后10天pct_change> 0.02)为真。 我无法真正了解如何开始。任何帮助,将不胜感激。到目前为止我的代码:熊猫时间系列日期范围切片基于列中的值

import pandas_datareader.data as web 
import datetime 
start = datetime.datetime(2010, 1, 1) 
end = datetime.datetime(2017, 1, 27) 
gspc2 = web.DataReader("^GSPC", 'yahoo', start, end) 
gspc2.rename(columns={'Adj Close' :'Adj_Close'}, inplace=True) gspc2['pct_change'] = gspc2['Adj_Close'].pct_change() 
gspc2 = gspc2.ix[(gspc2['pct_change'] > 0.0200)] 

回答

0

一个想法是:

  1. 取得符合criterian
  2. 展开基于你的范围
  3. 筛选出的行索引列的索引复制

这里是一个例子,希望它有帮助

import pandas as pd 
import numpy as np 

data = { 'a' : range(10, 24) } 
df = pd.DataFrame(data) 
df['b'] = (df.a % 5 == 0) # marks row 0, and 5 

# number to look back and forward 
n = 1 

# find the rows meet cretiria, row 0 and 5 
rows = np.where(df.b)[0] 

# expand 
rows = [x for row in rows for x in range(row-n, row+n+1) if x>= 0] 

# filter 
rows = list(set(rows)) 

print df.loc[rows] 

输出是:

 a  b 
0 10 True 
1 11 False 
4 14 False 
5 15 True 
6 16 False 
9 19 False 
10 20 True 
11 21 False 
+0

感谢很多的想法工作,鑫Hunag,我没有测试它尚未,但它看起来很有趣。我会尽快告诉你它是如何发生的。欢呼声 – kuatroka

+0

它适用于正常的索引,但时间序列索引失败。我适应了一下,但仍然有错误。行= gspc2.index [gspc2 ['pct_change'] == 0.000000] rows = [x用于行中x的行中的行(row-pd.DateOffset(days = 1),row + pd.DateOffset(days = 1))if x> = 0] print(rows) – kuatroka

0

我把它与黄鑫的代码服务我作为一个基础

import pandas_datareader.data as web 
import datetime 
import itertools 

# bringing stock data 
start = datetime.datetime(2010, 1, 1) 
end = datetime.datetime(2017, 3, 27) 
gspc2 = web.DataReader("UNG", 'yahoo', start, end) 
gspc2.rename(columns={'Adj Close' :'Adj_Close'}, inplace=True) 
gspc2['pct_change'] = gspc2['Adj_Close'].pct_change() 


# gspc2['std_dev2'] = gspc2['pct_change'].std()*2 
# gspc2['pct_change_mean'] = gspc2['pct_change'].mean() 

# setting filter condition 
condition = -0.07 
row_filter, gspc2['row_filter'] = gspc2.index[gspc2['pct_change'] <= condition ], (gspc2['pct_change'] <= condition) 

# window of days before and after the selected date 
n = 3 

selected_rows = [(pd.date_range(i - pd.DateOffset(days=n), periods=n*2+1)) for i in row_filter] 
selected_rows = list(itertools.chain.from_iterable(selected_rows)) 

# cumulative return n-2 days later after the day on which condition occured, without counting return on the day itself 
gspc2['cum_pct_change_ndays_after'] = gspc2.Adj_Close.shift(-(n-2))/gspc2.Adj_Close - 1 
gspc2['n_days_avg_return'] = gspc2.cum_pct_change_ndays_after.mean() 

final_df = gspc2.loc[selected_rows].dropna().drop_duplicates().sort_index(ascending=False) 



#print(row_filter) 
# removing nan due to mismatch in market days vs calendar days and removing duplicates 

print(final_df) 
print(final_df[final_df.row_filter])