2017-02-21 117 views
1

我正在尝试使用余弦距离类的apache commons。但它总是返回1.0。我错过了什么?这里是我的代码:apache.commons.text余弦距离

public class ComputeDistance { 
    public static void main(String[] args)throws Exception{ 

     CosineDistance dist = new CosineDistance(); 
     CharSequence c1 = "example text1"; 
     CharSequence c2 = "another file"; 
     System.out.println(dist.apply(c1,c2)); 
    } 
} 

回答

0

CosineDistance返回1 - cosineSimilarity(leftVector, rightVector)leftVectorrightVector是单词映射和char序列中出现的计数,所以结果为cosineSimilarity(leftVector, rightVector) = 0。您可以更改您的代码以使用您的字符序列的字符而不是以下字词:

public class ComputeDistance { 
    public static void main(String[] args) throws Exception { 

    CosineSimilarity dist = new CosineSimilarity(); 

    String c1 = "example text1"; 
    String c2 = "another file"; 

    Map<CharSequence, Integer> leftVector = 
     Arrays.stream(c1.split("")) 
     .collect(Collectors.toMap(c -> c, c -> 1, Integer::sum)); 
    Map<CharSequence, Integer> rightVector = 
     Arrays.stream(c2.split("")) 
     .collect(Collectors.toMap(c -> c, c -> 1, Integer::sum)); 

    System.out.println(1 - dist.cosineSimilarity(leftVector,rightVector)); 

    } 
}