查找差异数据帧

初始数据框：查找差异数据帧

df = 
      Index Nature Interval 
0   0  1  0.000000 
1   1  1  0.999627 
2   2  1  1.000607 
3   3  1  1.000612

项的总数是介于约700,000。

有什么方法可以找到“间隔”列中的一个元素与同一列中所有剩余元素之间的差异，并且对于其余数据帧必须完成相同的操作。

我找到了解决此问题的解决方法。该片段是

df["Potential"] = df["Interval"].apply(lambda x:print(np.sum([math.exp(-4 * abs(x - val)) for val in df['Interval']])))

但是，由于使用了for循环，它需要很长时间。

那么有没有什么办法来优化解决方案。

来源

2017-02-18 Arun Pottekat

您可以使用apply：的

b = df["Interval"].apply(lambda x: np.sum(np.exp(-4 * (x - df.Interval).abs()))) 
print (b) 
0 1.054885 
1 3.010498 
2 3.014339 
3 3.014319 
Name: Interval, dtype: float64

numpy的解决方案。减去重塑价值观Intrval列 '行'，然后应用abs，np.exp和np.sum：

val = df.Interval.values 
arr = np.sum(np.exp(-4*abs(val-val.reshape(len(df.index),-1))), axis=0) 
print (arr) 
[ 1.05488507 3.01049841 3.0143389 3.01431861] 

df["Potential"] = arr 
print (df) 
    Index Nature Interval Potential 
0  0  1 0.000000 1.054885 
1  1  1 0.999627 3.010498 
2  2  1 1.000607 3.014339 
3  3  1 1.000612 3.014319

另一个numpy的解决方案，谢谢piRSquared ：

i = df.Interval.values 
print (np.exp((np.abs(i[:, None] - i)) * -4).sum(1)) 
[ 1.05488507 3.01049841 3.0143389 3.01431861]

来源

2017-02-18 07:35:43 jezrael

谢谢@jezrael它确实提高了表现。 –

查找差异数据帧

回答

相关问题