那么,你可以通过列表理解实现这一目标:
>>> [s1 + " " + s2 for s1, s2 in zip(s.split(), s.split()[1:])]
['I love', 'love the', 'the Python', 'Python programming', 'programming language']
您也可以使用str.format
功能:
>>> ["{} {}".format(s1, s2) for s1, s2 in zip(s.split(), s.split()[1:])]
['I love', 'love the', 'the Python', 'Python programming', 'programming language']
功能的最终版本:
from itertools import tee, islice
def ngram(n, s):
var = [islice(it, i, None) for i, it in enumerate(tee(s.split(), n))]
return [("{} " * n).format(*itt) for itt in zip(*var)]
演示:
>>> from splitting import ngram
>>> thing = 'I love the Python programming language'
>>> ngram(2, thing)
['I love ', 'love the ', 'the Python ', 'Python programming ', 'programming language ']
>>> ngram(3, thing)
['I love the ', 'love the Python ', 'the Python programming ', 'Python programming language ']
>>> ngram(4, thing)
['I love the Python ', 'love the Python programming ', 'the Python programming language ']
>>> ngram(1, thing)
['I ', 'love ', 'the ', 'Python ', 'programming ', 'language ']
什么k执行? – inspectorG4dget
如果'n = 3'应该输出什么? – thefourtheye
查看'itertools pairwise'配方:http://docs.python.org/2/library/itertools.html#recipes和'collections.Counter'数据结构:http://docs.python.org/2/库/ collections.html#集合。计数器 – IceArdor