2017-04-04 57 views
1

假设我有一个时间序列为这样:事件研究大熊猫

pd.Series(np.random.rand(20), index=pd.date_range("1990-01-01",periods=20)) 

其中给出,

1990-01-01 0.018363 
1990-01-02 0.288625 
1990-01-03 0.460708 
1990-01-04 0.663063 
1990-01-05 0.434250 
1990-01-06 0.504893 
1990-01-07 0.587743 
1990-01-08 0.412223 
1990-01-09 0.604656 
1990-01-10 0.960338 
1990-01-11 0.606765 
1990-01-12 0.110480 
1990-01-13 0.671683 
1990-01-14 0.178488 
1990-01-15 0.458074 
1990-01-16 0.219303 
1990-01-17 0.172665 
1990-01-18 0.429534 
1990-01-19 0.505891 
1990-01-20 0.242567 
Freq: D, dtype: float64 

假设事件的日期是1990年1月5日和1990年1月15日。我想子集数据下降到长度的窗口(-2,+ 2)周围像这样的事件:

1990-01-03 0.460708 
1990-01-04 0.663063 
1990-01-05 0.434250 
1990-01-06 0.504893 
1990-01-07 0.587743 
1990-01-13 0.671683 
1990-01-14 0.178488 
1990-01-15 0.458074 
1990-01-16 0.219303 
1990-01-17 0.172665 
Freq: D, dtype: float64 

我应该如何去这样做呢?

回答

1

我认为你可以使用concat所有Series创建由list comprehensionloc

date1 = pd.to_datetime('1990-01-05') 
date2 = pd.to_datetime('1990-01-15') 
window = 2 

dates = [date1, date2] 

s1 = pd.concat([s.loc[date - pd.Timedelta(window, unit='d'): 
         date + pd.Timedelta(window, unit='d')] for date in dates]) 
print (s1) 
1990-01-03 0.284356 
1990-01-04 0.997019 
1990-01-05 0.293225 
1990-01-06 0.451379 
1990-01-07 0.743209 
1990-01-13 0.254926 
1990-01-14 0.339728 
1990-01-15 0.793124 
1990-01-16 0.121002 
1990-01-17 0.930924 
dtype: float64 
+0

感谢您的帮助,但由于这两个日期是两个事件的日期。使用你的方法可以一次处理一个,你是否建议我为两个事件日期情况做一个for循环? – zsljulius

+0

我认为是的,'iloc'在开始'1990-01-01'和结束日期'1990-01-17'可能会有问题。 – jezrael

1

试试这个:

In [23]: df['A'] 
Out[23]: 
2013-01-01 0.469112 
2013-01-02 1.212112 
2013-01-03 -0.861849 
2013-01-04 0.721555 
2013-01-05 -0.424972 
2013-01-06 -0.673690 
Freq: D, Name: A, dtype: float64 

In [25]: df['20130102':'20130104'] 
Out[25]: 
        A   B   C   D 
2013-01-02 1.212112 -0.173215 0.119209 -1.044236 
2013-01-03 -0.861849 -2.104569 -0.494929 1.071804 
2013-01-04 0.721555 -0.706771 -1.039575 0.271860 

[3 rows x 4 columns] 

从食谱:http://pandas.pydata.org/pandas-docs/version/0.13.1/10min.html?highlight=select%20where( “选择” 项)

1

我会建立一个布尔掩模来选择有趣的值:

import numpy as np 
import pandas as pd 

s = pd.Series(np.random.rand(20), index=pd.date_range("1990-01-01",periods=20)) 
events = [pd.to_datetime('1990-01-05'), pd.to_datetime('1990-01-15')] 
max_delta = pd.Timedelta(2, unit='d') 

mask = np.zeros_like(s, dtype=bool) 
for event in events: 
    mask |= np.abs(s.index - event) <= max_delta 
s_events = s[mask] 

print(s_events) 

输出:

1990-01-03 0.877271 
1990-01-04 0.770214 
1990-01-05 0.427380 
1990-01-06 0.971676 
1990-01-07 0.533582 
1990-01-13 0.060556 
1990-01-14 0.932072 
1990-01-15 0.501966 
1990-01-16 0.081177 
1990-01-17 0.167775 
dtype: float64