我最终搞清楚如何做到这一点:
创建稀疏矩阵和其适配到数据帧我能够通过合并与原始数据帧中的数据后,索引作为加入列。以下是我的代码示例:
tf_vect_final = CountVectorizer(max_df=0.90,min_df=5,stop_words=stop,
ngram_range=(5,5),analyzer='word')
tf_vect_final.fit(dfn['Not Written Comments_clean_stop'].tolist())
print("There are {} grams found".format(len(tf_vect_final.get_feature_names())))
tff = tf_vect_final.transform(dfn['Not Written Comments_clean_stop'].tolist())
tff = pd.DataFrame(tff.toarray(),columns=tf_vect_final.get_feature_names())
dfn.index.names=['PK']
tff.index.names=['PK']
dfn = dfn.reset_index()
tff = tff.reset_index()
dfn_final = dfn.merge(tff, on= 'PK')