2017-09-16 38 views
1

我有两个datetime指数 - 一个是工作日的date_range,另一个是假期列表。如何筛选两个日期时间索引?

我按开始日期和结束日期过滤假日列表。但现在我需要加入他们,并删除任何重复(假期和交易日都存在)。

最后,我需要将日期范围转换为格式化的字符串列表,即:yyyy_mm_dd,我可以稍后进行迭代。

这里是我到目前为止的代码:

import datetime 
import pandas as pd 
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, nearest_workday, \ 
    USMartinLutherKingJr, USPresidentsDay, GoodFriday, USMemorialDay, \ 
    USLaborDay, USThanksgivingDay 

class USTradingCalendar(AbstractHolidayCalendar): 
    rules = [ 
     Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday), 
     USMartinLutherKingJr, 
     USPresidentsDay, 
     GoodFriday, 
     USMemorialDay, 
     Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday), 
     USLaborDay, 
     USThanksgivingDay, 
     Holiday('Christmas', month=12, day=25, observance=nearest_workday) 
    ] 

def get_trading_close_holidays(year): 
    inst = USTradingCalendar() 
    return inst.holidays(datetime.datetime(year-1, 12, 31), 
         datetime.datetime(year, 12, 31)) 

start_date = "2017_07_01" 
end_date = "2017_08_31" 

start_date = datetime.datetime.strptime(start_date,"%Y_%m_%d").date() 
end_date = datetime.datetime.strptime(end_date,"%Y_%m_%d").date() 

date_range = pd.bdate_range(start = start_date, end = end_date, name = 
          "trading_days") 
holidays = get_trading_close_holidays(start_date.year) 
holidays = holidays.where((holidays.date > start_date) & 
          (holidays.date < end_date)) 
holidays = holidays.dropna(how = 'any') 
date_range = date_range.where(~(date_range.trading_days.isin(holidays))) 
+0

虽然标题提出问题,但帖子并未发现问题更别提问具体问题了。你只描述需求。任何错误?不想要的结果?请显示所需的输出。 – Parfait

+0

道歉 - 我的最后一行代码不起作用。我在这里提出的问题有两点:1)对于一种最佳实践方式来结合两个日期时间索引,以便删除任何重复项; 2)如何接受这些日期时间对象并将它们格式化为字符串 – cifc

回答

0

由布尔条件考虑过滤:

date_range = date_range[date_range.date != holidays.date] 
print(date_range) # ONE HOLIDAY 2017-07-04 DOES NOT APPEAR 

# DatetimeIndex(['2017-07-03', '2017-07-05', '2017-07-06', '2017-07-07', 
#    '2017-07-10', '2017-07-11', '2017-07-12', '2017-07-13', 
#    '2017-07-14', '2017-07-17', '2017-07-18', '2017-07-19', 
#    '2017-07-20', '2017-07-21', '2017-07-24', '2017-07-25', 
#    '2017-07-26', '2017-07-27', '2017-07-28', '2017-07-31', 
#    '2017-08-01', '2017-08-02', '2017-08-03', '2017-08-04', 
#    '2017-08-07', '2017-08-08', '2017-08-09', '2017-08-10', 
#    '2017-08-11', '2017-08-14', '2017-08-15', '2017-08-16', 
#    '2017-08-17', '2017-08-18', '2017-08-21', '2017-08-22', 
#    '2017-08-23', '2017-08-24', '2017-08-25', '2017-08-28', 
#    '2017-08-29', '2017-08-30', '2017-08-31'], 
#    dtype='datetime64[ns]', name='trading_days', freq=None) 

而且使用astype()于日期索引字符串类型的数组,甚至tostring()的列表转换转换:

strdates = date_range.date.astype('str').tolist() 
print(strdates) 

# ['2017-07-03', '2017-07-05', '2017-07-06', '2017-07-07', '2017-07-10', 
# '2017-07-11', '2017-07-12', '2017-07-13', '2017-07-14', '2017-07-17', 
# '2017-07-18', '2017-07-19', '2017-07-20', '2017-07-21', '2017-07-24', 
# '2017-07-25', '2017-07-26', '2017-07-27', '2017-07-28', '2017-07-31', 
# '2017-08-01', '2017-08-02', '2017-08-03', '2017-08-04', '2017-08-07', 
# '2017-08-08', '2017-08-09', '2017-08-10', '2017-08-11', '2017-08-14', 
# '2017-08-15', '2017-08-16', '2017-08-17', '2017-08-18', '2017-08-21', 
# '2017-08-22', '2017-08-23', '2017-08-24', '2017-08-25', '2017-08-28', 
# '2017-08-29', '2017-08-30', '2017-08-31'] 
+0

这正是我所期待的为 - 感谢您的帮助 – cifc

+0

非常感谢!乐意效劳。 – Parfait