我需要一些帮助来优化Python代码

我正在使用Python的KNN分类器，但我有一些问题。下面这段代码需要7.5s-9.0s才能完成，我必须运行它60.000次。我需要一些帮助来优化Python代码

 for fold in folds: 
      for dot2 in fold: 
       """ 
       distances[x][0] = Class of the dot2 
       distances[x][1] = distance between dot1 and dot2 
       """ 
       distances.append([dot2[0], calc_distance(dot1[1:], dot2[1:], method)])

的“折叠”变量是用10倍该求和包含在.csv格式的图像的输入60.000列表。每个点的第一个值是它所属的类。所有的值都是整数。有没有办法让这条生产线更快运行？

这是calc_distance功能

def calc_distancia(dot1, dot2, distance): 

if distance == "manhanttan": 
    total = 0 
    #for each coord, take the absolute difference 
    for x in range(0, len(dot1)): 
     total = total + abs(dot1[x] - dot2[x]) 
    return total 

elif distance == "euclidiana": 
    total = 0 
    for x in range(0, len(dot1)): 
     total = total + (dot1[x] - dot2[x])**2 
    return math.sqrt(total) 

elif distance == "supremum": 
    total = 0 
    for x in range(0, len(dot1)): 
     if abs(dot1[x] - dot2[x]) > total: 
      total = abs(dot1[x] - dot2[x]) 
    return total 

elif distance == "cosseno": 
    dist = 0 
    p1_p2_mul = 0 
    p1_sum = 0 
    p2_sum = 0 
    for x in range(0, len(dot1)): 
     p1_p2_mul = p1_p2_mul + dot1[x]*dot2[x] 
     p1_sum = p1_sum + dot1[x]**2 
     p2_sum = p2_sum + dot2[x]**2 
    p1_sum = math.sqrt(p1_sum) 
    p2_sum = math.sqrt(p2_sum) 
    quociente = p1_sum*p2_sum 
    dist = p1_p2_mul/quociente 

    return dist

编辑：找到了一种方法，使其更快，至少对于“manhanttan”的方法。相反的：

if distance == "manhanttan": 
    total = 0 
    #for each coord, take the absolute difference 
    for x in range(0, len(dot1)): 
     total = total + abs(dot1[x] - dot2[x]) 
    return total

我把

if distance == "manhanttan": 
    totalp1 = 0 
    totalp2 = 0 
    #for each coord, take the absolute difference 
    for x in range(0, len(dot1)): 
     totalp1 += dot1[x] 
     totalp2 += dot2[x] 

    return abs(totalp1-totalp2)

的abs()调用非常沉重

来源

2014-10-27 Victor

这里有一些链接，可以帮助：https：//开头的wiki。 python.org/moin/PythonSpeed/PerformanceTips http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/benchmarks/timeit_tests.ipynb?create=1#string_operations – Totem 2014-10-27 21:39:39

请编辑您的答案以包含整个代码。还包括输入（或至少其中的一部分）。 – 2014-10-27 21:40:04

*“有助于优化Python代码”*在这里不是一个主题问题。 – jonrsharpe 2014-10-27 21:40:05

有许多导游 “蟒蛇纹”;你应该搜索一些，阅读它们，并通过分析过程来确保你知道你的工作的哪些部分花费最多的时间。

但是，如果这真的是你工作的核心，那么calc_distance就是大部分运行时间被消耗的一个公平的选择。

深入优化可能需要使用加速数学或类似的更低级方法NumPy。

作为一种快速且肮脏的方法，需要较少侵入性的分析和重写，请尝试安装Python的PyPy实现并在其下运行。与标准（CPython）实现相比，我已经看到简单的2倍或更多的加速。

来源

2014-10-27 22:04:05

我很困惑。你有没有试过探查器？

python -m cProfile myscript.py

它会告诉你在哪里大部分时间被消耗并提供硬数据来处理。例如。重构减少的呼叫的数量，重组的输入数据，代替这个函数，该函数等

https://docs.python.org/3/library/profile.html

来源

2014-10-27 22:18:06

我跟我的老师说过，他说时间是正确的。这需要很多时间。我使用这些参数，他们会帮助我很多。函数“calc_distance”需要很长时间来处理。我会尽量让它更快。 – Victor 2014-10-28 02:10:59

您可以使用numpy数组提高很多。 – badc0re 2014-10-28 07:34:53

首先，应避免使用单个calc_distance函数，在一个执行线性搜索每次通话时的字符串列表。定义独立的距离函数并调用正确的函数。正如李丹尼克罗克建议，不要使用切片，只需开始你的循环范围为1.

对于余弦距离，我建议所有的点向量归一化。这种方式的距离计算减少到点积。

这些微优化可以给你一些加速。但是切换到更好的算法应该可以获得更好的收益：kNN分类器要求kD-tree，这将允许您从考虑中快速移除很大一部分点。

这是很难实现（你必须稍微适应了不同的距离，余弦距离将使它非常棘手。）

来源

2014-10-28 08:50:31

我需要一些帮助来优化Python代码

回答

相关问题