1
我有一个模型本身(非Hadoop的):亨利马乌:正火UserSimilarity距离
DataModel data = new FileDataModel(new File("file.csv"));
UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(dataModel);
userSimilarity.setPreferenceInferrer(new AveragingPreferenceInferrer(data));
UserNeighborhood userNeighborhood = new NearestNUserNeighborhood(1, userSimilarity, data);
userSimilarity不是[0100]之间的归一化,例如,因此,如果想以显示它给最终用户,我使用以下溶液:
long maxSim = userSimilarity.userSimilarity(userId1, userNeighborhood.getUserNeighborhood(userId1)[0]);
long finalSimilarity = Math.min(100, Math.max((int) Math.ceil(100 * userSimilarity.userSimilarity(userId1, userId2)/maxSim), 0))
我观察到的性能问题与该(各种秒为每个用户),有另一种可能,或以具有分钟(相似性)= 0和max(相似性)= 100为最快的方式每个给定的用户?