2017-03-24 57 views
2

我有一个熊猫数据框,我想在其中更新基于数据框中另一列的列的值。我之前使用以下代码进行更新:熊猫:更高效的方式来更新熊猫数据框中的一列没有for循环

for i1, col1 in dfMod.iterrows(): 
if col1['day'] == "MONDAY": 
    dfMod.ix[i1,'weekIndex'] = 1 
elif col1['day'] == "TUESDAY": 
    dfMod.ix[i1,'weekIndex'] = 2 
elif col1['day'] == "WEDNESDAY": 
    dfMod.ix[i1,'weekIndex'] = 3 
elif col1['day'] == "THURSDAY": 
    dfMod.ix[i1,'weekIndex'] = 4 
elif col1['day'] == "FRIDAY": 
    dfMod.ix[i1,'weekIndex'] = 5 
elif col1['day'] == "SATURDAY": 
    dfMod.ix[i1,'weekIndex'] = 6 
else: 
    dfMod.ix[i1,'weekIndex'] = 7 

但是,数据帧有300,000行,需要永久编译。有没有更好的方法来更新列?

+1

看那系列'map'方法。 – BrenBarn

+0

我刚刚也问过这个。我的问题可能对你有用:http://stackoverflow.com/questions/42972081/updating-columns-in-dataframe-using-a-series – sdasdadas

回答

3

您需要map通过dict

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, 
    "FRIDAY":5, "SATURDAY":6, "SUNDAY":7} 

dfMod["weekIndex"] = dfMod["day"].map(d) 

样品:

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']}) 

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, 
    "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7} 

dfMod["weekIndex"] = dfMod["day"].map(d) 
print (dfMod) 
     day weekIndex 
0 TUESDAY   2 
1 THURSDAY   4 
2 FRIDAY   5 
3 SATURDAY   6 
4 MONDAY   1 
5 SUNDAY   7 

时序在300k - map更快6 timesapply解决方案:

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']}) 
#300k rows 
dfMod = pd.concat([dfMod]*50000).reset_index(drop=True) 

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, 
    "FRIDAY":5, "SATURDAY":6, "SUNDAY":7} 

In [92]: %timeit dfMod["weekIndex"] = dfMod["day"].map(d) 
10 loops, best of 3: 22.7 ms per loop 

In [93]: %timeit dfMod["weekIndex1"] = dfMod["day"].apply(lambda x: d[x]) 
10 loops, best of 3: 141 ms per loop 
+0

非常感谢,完美的工作,正如你所说的方式比应用更快 – ayush

+0

我已经在300k行测试你的原始解决方案 - '1循环,最好的3:每个循环21分钟47s' – jezrael

1

尝试apply方法:

daysOfWeek = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7} 

dfMod["weekIndex"] = dfMod["day"].apply(lambda x: daysOfWeek[x]) 
+0

这工作,谢谢! – ayush

1

请用@ jezrael的答案,因为它是地道的。
这纯粹是为了演示,并试图提供有关可能使用的其他熊猫工具的有用信息。

设置
使用@ jezrael的给定的例子

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']}) 

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, 
    "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7} 

备用溶液

dfMod.join(pd.Series(d, name='weekIndex'), on='day') 

     day weekIndex 
0 TUESDAY   2 
1 THURSDAY   4 
2 FRIDAY   5 
3 SATURDAY   6 
4 MONDAY   1 
5 SUNDAY   7