pandas dataframe通过复制前一个数据帧的n行来创建一个新的数据帧，并更改日期

我有一个大约9k行57列的数据框，这是'df'。pandas dataframe通过复制前一个数据帧的n行来创建一个新的数据帧，并更改日期

我需要一个新的数据框：'df_final' - 对于'df'的每一行，我必须复制每行'x'次并逐一增加每行的日期，也是'x'次。虽然我可以这样做几次迭代，当我为'df''len（df）'这个循环做了这么长时间（> 3小时）的循环时，我实际上已经取消了它。我从来没有见过它的结局。以下是当前代码：

df.shape 
output: (9454, 57) 

df_int = df[0:0] 
df_final = df_int[0:0] 
range_df = len(df) 
for x in range(0,2): 
    df_int = df.iloc[0+x:x+1] 
    if abs(df_int.iat[-1,3]) > 0: 
     df_int = pd.concat([df_int]*abs(df_int.iat[-1,3]), ignore_index=True) 
     for i in range(1, abs(df_int.iat[-1,3])): 
      df_int['Consumption Date'][i] = df_int['Consumption Date'][i-1] + datetime.timedelta(days = 1) 
      i += 1 
     df_final = df_final.append(df_int, ignore_index=True) 
    x += 1

'df'前两行的循环结果如下。

DF的前两行：

期望的结果：

是否有另一种方式来获得所需的输出。看来大熊猫对循环处理不好。在VBA excel中，同样的循环需要大约3/4分钟...我试图改变一个当前处于excel中的进程，但是，如果没有办法让这个工作，我想我会坚持旧的方式。 ..

来源

2017-09-23 dapaz

使用repeat和cumcount

In [2972]: dff = df.loc[df.index.repeat(3)] 

In [2973]: dff 
Out[2973]: 
     date name 
0 2017-05-03 bob 
0 2017-05-03 bob 
0 2017-05-03 bob 
1 2017-06-13 sally 
1 2017-06-13 sally 
1 2017-06-13 sally 

In [2974]: dff.loc[:, 'date'] += pd.to_timedelta(dff.groupby(level=0).cumcount(), 'D') 

In [2975]: dff 
Out[2975]: 
     date name 
0 2017-05-03 bob 
0 2017-05-04 bob 
0 2017-05-05 bob 
1 2017-06-13 sally 
1 2017-06-14 sally 
1 2017-06-15 sally

详细

In [2976]: df 
Out[2976]: 
     date name 
0 2017-05-03 bob 
1 2017-06-13 sally 

In [2977]: dff.groupby(level=0).cumcount() 
Out[2977]: 
0 0 
0 1 
0 2 
1 0 
1 1 
1 2 
dtype: int64

来源

2017-09-23 15:45:55 Zero

非常好的溶胶 – Dark

太感谢你了！这适合像手套！ :) – dapaz

让我们用这个玩具数据框：

df = pd.DataFrame({ 
    'date': pd.to_datetime(['2017-05-03', '2017-06-13']), 
    'name': ['bob', 'sally'], 
})

它看起来像这样：

 date name 
0 2017-05-03 bob 
1 2017-06-13 sally

然后：

x = 3 # repeat count 
ind = np.repeat(np.arange(len(df)), x) # 0,0,0,1,1,1 
df_final = df.iloc[ind].copy()

，将给你重复：

 date name 
0 2017-05-03 bob 
0 2017-05-03 bob 
0 2017-05-03 bob 
1 2017-06-13 sally 
1 2017-06-13 sally 
1 2017-06-13 sally

现在你只需要增加的日期：

inc = np.tile(np.arange(x), len(df)) # 0,1,2,0,1,2 
df_final.date += pd.to_timedelta(inc, 'D')

，你会得到：

 date name 
0 2017-05-03 bob 
0 2017-05-04 bob 
0 2017-05-05 bob 
1 2017-06-13 sally 
1 2017-06-14 sally 
1 2017-06-15 sally

来源

2017-09-23 15:34:23

谢谢你的答案 – dapaz

这里是一个解决方案

df1=df.reset_index().set_index('date').groupby('index').\ 
    apply(lambda x :x.reindex(pd.date_range(start=x.index[0],periods=3,freq='D'))).ffill() 
df1 
Out[202]: 
        index name 
index       
0  2017-05-03 0.0 bob 
     2017-05-04 0.0 bob 
     2017-05-05 0.0 bob 
1  2017-06-13 1.0 sally 
     2017-06-14 1.0 sally 
     2017-06-15 1.0 sally

然后

df1.drop('index',1).reset_index().rename(columns={'level_1':'date'}).drop('index',1) 

Out[212]: 
     date name 
0 2017-05-03 bob 
1 2017-05-04 bob 
2 2017-05-05 bob 
3 2017-06-13 sally 
4 2017-06-14 sally 
5 2017-06-15 sally

来源

2017-09-23 15:57:42 Wen

pandas dataframe通过复制前一个数据帧的n行来创建一个新的数据帧，并更改日期

回答

相关问题