1
我想执行以下计算:NGRAM在R:计算单词频率和值的总和
输入:
Column_A Column_B
Word_A 10
Word_A Word_B 20
Word_B Word_A 30
Word_A Word_B Word_C 40
输出:
Column_A1 Column_B1
Word_A 100 = 10+20+30+40
Word_B 90 = 20+30+40
Word_C 40 = 40
Word_A Word_B 90 = 20+30+40
Word_A Word_C 40 = 40
Word_B Word_C 40 = 40
Word_A Word_B Word_C 40 = 40
的输出中单词的顺序无关紧要,所以Word_A Word_B = 90 = Word_B Word_A。使用RWeka和TM库,我能提取unigram进行(只有一个字),位我需要有n元,其中n = 1,2,3和计算column_B1