2014-03-03 101 views
1
time_period total_cost total_revenue 
7days   150   250 
14days   350   600 
30days   900   750 
7days   180   400 
14days   430   620 

鉴于此数据,我想将total_cost和total_revenue列转换为给定时间段的平均值。我认为这会工作:有条件地执行大熊猫数据框的计算

df[['total_cost','total_revenue']][df.time_period]=="7days"]=df[['total_cost','total_revenue']][df.time_period]=="7days"]/7 

但它返回数据帧不变。

回答

3

我相信你正在操作数据框的副本。我认为你应该使用apply

from StringIO import StringIO 
import pandas 
datastring = StringIO("""\ 
time_period total_cost total_revenue 
7days   150   250 
14days   350   600 
30days   900   750 
7days   180   400 
14days   430   620 
""") 

data = pandas.read_table(datastring, sep='\s\s+') 

data['total_cost_avg'] = data.apply(
    lambda row: row['total_cost']/float(row['time_period'][:-4]), 
    axis=1 
) 

给我:

time_period total_cost total_revenue total_cost_avg 
0  7days   150   250  21.428571 
1  14days   350   600  25.000000 
2  30days   900   750  30.000000 
3  7days   180   400  25.714286 
4  14days   430   620  30.714286 
+0

你也可以使用str.extract提取日子:)有点感觉应该是做一个timedelta的方法:s –

2

保罗出色答卷。在这里添加我的方法

test_df = pd.read_csv("file1.csv") 
test_df 

    time_period  total_cost total_revenue 
0 7days   150  250 
1 14days   350  600 
2 30days   900  750 
3 7days   180  400 
4 14days   430  620 

test_df['days'] = test_df.time_period.str.extract('(\d*)days').apply(int) 
test_df['total_cost'] = test_df.total_cost/test_df.days 
test_df['total_revenue'] = test_df.total_revenue/test_df.days 
del test_df['days'] 
test_df 


    time_period total_cost  total_revenue 
0 7days  21.428571   35.714286 
1 14days  25.000000   42.857143 
2 30days  30.000000   25.000000 
3 7days  25.714286   57.142857 
4 14days  30.714286   44.285714