Here are some code to access terms in a Lucene document:
int docId = hits[i].doc;
TermFreqVector tfvector = reader.getTermFreqVector(docId, "contents");
TermPositionVector tpvector = (TermPositionV
我需要通过比较文档条款来构建相似矩阵。例如,如果Document1和Document2有两个相同的术语,那么我需要在我的相似度矩阵 (m [1,2])处写一个2。我的相似性矩阵现在看起来像这样: [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 0 NA NA NA NA NA NA NA NA
[2,] 0 0 NA NA NA NA NA