2017-10-18 38 views
0

所以我有一个DataFrame与几千行包含人工外汇交易数据。前十行是这样的:大数据集上的循环次优

enter image description here

我想遍历这个集合,并为每一行,计算CommonCurrency在这种情况下将是美元。因此,对于每一行,我走在CurrencyPairDeskRateOrderQty列和计算CommonCurrency

for i in range(len(order_data)): 
    if (order_data['CurrencyPair'][i] == 'GBP/USD'): 
     order_data['CommonCurrency'][i] = order_data['DeskRate'][i] * 
     order_data['OrderQty'][i] 
    elif (order_data['CurrencyPair'][i] == 'AUD/USD'): 
     order_data['CommonCurrency'][i] = order_data['DeskRate'][i] * 
     order_data['OrderQty'][i] 
    elif (order_data['CurrencyPair'][i] == 'EUR/USD'): 
     order_data['CommonCurrency'][i] = order_data['DeskRate'][i] * 
     order_data['OrderQty'][i] 
    elif (order_data['CurrencyPair'][i] == 'USD/CHF'): 
     order_data['CommonCurrency'][i] = order_data['DeskRate'][i]/
     order_data['OrderQty'][i] 
    elif (order_data['CurrencyPair'][i] == 'EUR/GBP'): 
     order_data['CommonCurrency'][i] = #different calculation 

这似乎并不喜欢做的正确的方式,特别是没有如果有大量不同的货币对。我遇到的另一个问题是,当我到达EUR/GBP时,因为现在我必须同时获得DeskRateEUR/USD,我看不出如何使用此方法。

任何提示?

回答

2

熊猫的一个有趣功能是indexing的概念。有这样做,但使用loc的更Python的方式,你可以使用系列(列)将值分配给数据框的一部分:

order_data.loc[order_data['CurrencyPair'].isin(('GBP/USD', 'AUD/USD', 'EUR/USD')), 'CurrencyPair'] = order_data['DeskRate'] * order_data['OrderQty'] 
order_data.loc[order_data['CurrencyPair'] == 'USD/CHF', 'CurrencyPair'] = order_data['DeskRate']/order_data['OrderQty'] 
order_data.loc[order_data['CurrencyPair'] == 'EUR/GBP', 'CurrencyPair'] = some_func(order_data['DeskRate'], order_data['OrderQty']) 

从而避免任何for循环