2017-04-26 151 views
-5

为您提供了四个文档,编号为1到4,每个文档都有一个文本句子。根据TF-IDF分数计算,确定与第一个文档最相似的文档的标识符。Python中两个文本文档之间的相似性

My name is Ankit, 
Ankit name is very famous, 
Ankit like his name 
India has a lot of beautiful cities 

输出整数(可以是2或3或4),不留任何前导或尾随空格。

+2

你试过的,显示你的代码。 –

回答

2
import numpy as np 

from sklearn.feature_extraction.text import TfidfVectorizer 

vect = TfidfVectorizer(min_df=1) 

tfidf = vect.fit_transform(["My name is Ankit", 
          "Ankit name is very famous", 
          "Ankit like his name", 
          "India has a lot of beautiful cities"]) 

print ((tfidf * tfidf.T).A) 
相关问题