我有以下两个数据集:使用KNN来比较dataframes
import pandas as pd
from scipy.spatial import distance
all = {'test' : [0.3, 0.9],
'call' : [0.2, 1.3],
'category': ["A", "B"]}
all = pd.DataFrame(all)
df = pd.DataFrame()
df = df.append({'test': 0.2, 'call': 0.4}, ignore_index=True)
基于这些data.frames我想查哪个类别DF接近:A类或B
因此,我做了以下内容:
让所有data.frame数字
all_numeric = all[[ 'test', 'call']]
计算ŧ他欧几里得距离
euclidean_distances = all_numeric.apply(lambda row: distance.euclidean(row, df), axis=1)
distance_frame = pd.DataFrame(data={"dist": euclidean_distances, "idx": euclidean_distances.index})
distance_frame.sort("dist", inplace=True)
print(distance_frame)
而接下来我想在所有data.frame
lookup_value = distance_frame.iloc[0]['idx']
question = all['category'][0]
print("This customer content is labeled as %s" % question)
来查找值。但是,如果我尝试这与
df = pd.DataFrame()
df = df.append({'test': 0.9, 'call': 1.3}, ignore_index=True)
我应打印“标记为B“,所以我认为出了问题。有谁能告诉我,我错了吗?
感谢您的回答。是的,我会喜欢一个例子。 –
希望有帮助! :) – Falcon9
谢谢,这确实有帮助! –