排名的距离

我有两种方法对字符串列表进行了不同的排列，我们可以认为这是列表的“正确”排名（即黄金标准）。排名的距离

换句话说：

ranked_list_of_strings_1 = method_1(list_of_strings) 
ranked_list_of_strings_2 = method_2(list_of_strings)  
correctly_ranked_list_of_strings # Some permutation of list_of_strings

如何确定哪种方法更好的考虑到method_1和method_2是黑盒子？是否有任何方法可以用SciPy或scikit-learn或类似的库来衡量？

在我的具体情况下，我实际上有一个数据框，每个方法输出一个分数。重要的不是方法和真实分数之间的分数差异，而是方法得到排名正确（更高的分数意味着更高的所有列的排名）。

 strings  scores_method_1 scores_method_2 true_scores 
5714 aeSeOg     0.54    0.1   0.8 
5741 NQXACs     0.15    0.3   0.4 
5768 zsFZQi     0.57    0.7   0.2

来源

2014-05-23 Amelio Vazquez-Reina