2017-10-16 26 views
1

想要将df.Month爆炸为周并将数量平均分配到周。周从星期一开始。将月份的数量平均分配到周(2)

DF

Country   Item  Month   Qty 
    ------------------------------------------- 
0 New Zealand  Apple  2017-10-31  100 
1 Puerto Rico  Banana  2017-11-30  200 
2 France   Apple  2017-10-31  400 
... 

期望的输出是:

Country  Item  Week   Qty 
    ------------------------------------------- 
0 New Zealand Apple  2017-10-01  20 
1 New Zealand Apple  2017-10-08  20 
2 New Zealand Apple  2017-10-15  20 
3 New Zealand Apple  2017-10-22  20 
4 New Zealand Apple  2017-10-29  20 
5 Puerto Rico Banana  2017-11-05  50 
6 Puerto Rico Banana  2017-11-12  50 
7 Puerto Rico Banana  2017-11-19  50 
8 Puerto Rico Banana  2017-11-26  50 
9 France  Apple  2017-10-01  80 
10 France  Apple  2017-10-08  80 

...

使用专为周数据帧: mondays = pd.Series(pd.date_range(first_day, last_day, freq='W-Mon')) weeks = pd.DataFrame({'Week':mondays})

2)周

Week 
    ---------- 
0 2017-10-01 
1 2017-10-08 
2 2017-10-15 
3 2017-10-22 
4 2017-10-29 
5 2017-11-05 
6 2017-11-12 
7 2017-11-19 
8 2017-11-26 

... 

这是对问题的扩展:Distribute month's quantity equally into weeks

+0

'DF1 = df1.drop_duplicates( '月')'与之前的解决方案不起作用? – jezrael

回答

1

您可以使用:

mondays = pd.Series(pd.date_range('2017-10-01', '2017-11-26 ', freq='W-Mon')) 
weeks = pd.DataFrame({'Week':mondays}) 

#month period for merge 
df['Month'] = pd.to_datetime(df['Month']).dt.to_period('m') 
weeks['Week'] = pd.to_datetime(weeks['Week']) 
#month period for merge 
weeks['Month'] = weeks['Week'].dt.to_period('m') 

#merge by Month 
df = pd.merge(df, weeks, on='Month') 
#divide by map by Series created by count 
df['Qty'] = df['Qty'].div(df['Month'].map(weeks['Month'].value_counts())) 
df = df.drop('Month', 1) 
print (df) 
     Country Item  Qty  Week 
0 New Zealand Apple 20.000000 2017-10-02 
1 New Zealand Apple 20.000000 2017-10-09 
2 New Zealand Apple 20.000000 2017-10-16 
3 New Zealand Apple 20.000000 2017-10-23 
4 New Zealand Apple 20.000000 2017-10-30 
5  France Apple 80.000000 2017-10-02 
6  France Apple 80.000000 2017-10-09 
7  France Apple 80.000000 2017-10-16 
8  France Apple 80.000000 2017-10-23 
9  France Apple 80.000000 2017-10-30 
10 Puerto Rico Banana 66.666667 2017-11-06 
11 Puerto Rico Banana 66.666667 2017-11-13 
12 Puerto Rico Banana 66.666667 2017-11-20 
+0

谢谢@jezrael。第二种解决方案可以工作,因为它保留了重复项目。 “Item”后面有更多的字段 - 像_Country,Region_等。在第二种解决方案中,我如何显示所有字段? 'print(df1)'只显示_Month_和_Qty_ – reservoirinvest

+0

你可以修改问题吗?但需要聚合所有列,否则列被忽略...所以需要'df1 = df.groupby('Month',as_index = False).agg({'Qty':sum,'Country':'first','Region ':'_'。join})'。我没有数据,只展示了可能的数据列聚合。 – jezrael

+0

我认为聚合不起作用。我期待着几个月到几周的爆发。不聚合。 – reservoirinvest