基于日期时间的数据帧内部连接

我有两个数据帧df1和df2。基于日期时间的数据帧内部连接

df1.index 
DatetimeIndex(['2001-09-06', '2002-08-04', '2000-01-22', '2000-12-19', 
       '2008-02-09', '2010-07-07', '2011-06-04', '2007-03-14', 
       '2003-05-17', '2016-02-27',..dtype='datetime64[ns]', name=u'DateTime', length=6131, freq=None) 

df2.index 
DatetimeIndex(['2002-01-01 01:00:00', '2002-01-01 10:00:00', 
       '2002-01-01 11:00:00', '2002-01-01 12:00:00', 
       '2002-01-01 13:00:00', '2002-01-01 14:00:00',..dtype='datetime64[ns]', length=129273, freq=None)

即，df1的索引为天，df2的索引为datetime。我想在索引上执行df1和df2的内部连接，以便如果df1中的小时数对应的日期在df1中可用，我们认为内部连接为true，否则为false。

我想获得两个df11和df22作为输出。 df11将有df1的共同日期和相应的列。 df22将具有共同的日期时间和来自df2的相应列。

E.g. df1中的'2002-08-04'和df2中的'2002-08-04 01:00:00'被认为存在于两者中。

但是如果df1中的'1802-08-04'在df2中没有小时，它不会出现在df11中。

但是如果df2中的'2045-08-04 01:00:00'在df1中没有日期，则它不存在于df22中。

现在我正在使用numpy in1d和pandas normalize函数以冗长的方式完成这项任务。我正在寻找pythonic的方式来实现这一目标。

来源

2016-11-29 Zanam

你可以发布代码吗？这会让你想要做的更明显。 – sangrey

考虑一个虚设DF构建，如下所示：含DateTimeIndex作为唯一日期属性

idx1 = pd.date_range(start='2000/1/1', periods=100, freq='12D') 
idx2 = pd.date_range(start='2000/1/1', periods=100, freq='300H') 
np.random.seed([42, 314])

DF：

df1 = pd.DataFrame(np.random.randint(0,10,(100,2)), idx1) 
df1.head()

DF含有DateTimeIndex如日期+时间属性：

df2 = pd.DataFrame(np.random.randint(0,10,(100,2)), idx2) 
df2.head()

获取常见指标只考虑匹配日期作为区分参数。

intersect = pd.Index(df2.index.date).intersection(df1.index)

第一共用指数包含的这列DF的原始数据帧：

df11 = df1.loc[intersect] 
df11

第二公共指数DF包含它的列的原始数据帧：

来源

2016-11-29 16:04:45

基于日期时间的数据帧内部连接

回答

相关问题