如何遍历Python中的字符串的句子？

假设我有一个字符串text = "A compiler translates code from a source language"。我希望做两件事情：如何遍历Python中的字符串的句子？

我需要通过每个字进行迭代，并使用NLTK库干。阻止的功能是PorterStemmer().stem_word(word)。我们必须通过“单词”这个论点。我怎样才能解决每个单词并找回句子？
我需要从text字符串中删除某些停用词。包含停用词列表存储在一个文本文件（空格分隔）
```
stopwordsfile = open('c:/stopwordlist.txt','r+') 
stopwordslist=stopwordsfile.read() 
```
如何从text删除那些停止词，并得到一个干净的新字符串？

2012-05-08 ChamingaD

'一个字text.split（''）：stemmer.stem_word（word）'？ – birryree

stemmed = for word in text.split（''）：stemmer.stem_word（word）这个工作吗？ – ChamingaD

不完全。如果你想要一个词干的列表，你可以在'text.split（''）]'中做'stemmed = [stemmer.stem_word（w）]。如果你想要一个句子，你可以做'sente =''.join（stemmed）'，这将返回所有词干的句子。让我知道这是否有帮助。 – birryree

我张贴这是一个评论，但认为我还不如充实它与一些解释一个完整的答案：

你想用str.split()分割字符串转换成单词，然后干每个字：

for word in text.split(" "): 
    PorterStemmer().stem_word(word)

当你想要得到的所有的梗单词串起来，是微不足道的，然后加入这些茎一起回来。要做到这一点很容易和高效，我们使用str.join()和generator expression：

" ".join(PorterStemmer().stem_word(word) for word in text.split(" "))

编辑：

您的其他问题：

with open("/path/to/file.txt") as f: 
    words = set(f)

在这里，我们打开使用the with statement（这是最好的文件打开文件的方式，因为它可以正确地关闭它们，即使是在异常情况下也是如此，并且更具可读性），并将内容读取到一个集合中。我们使用一个集合，因为我们不关心这些单词的顺序或重复，并且以后会更有效率。我假设每行有一个字 - 如果不是这种情况，并且它们用逗号分隔，或者使用str.split()（与适当的参数一起使用str.split()）分隔的空格可能是一个很好的计划。

stems = (PorterStemmer().stem_word(word) for word in text.split(" ")) 
" ".join(stem for stem in stems if stem not in words)

这里我们使用生成器表达式的if子句来忽略从文件中加载的单词集中的单词。一个集合上的成员资格检查是O（1），所以这应该是相对有效的。

编辑2：

要删除的话，他们是朵朵之前，这是更简单：

" ".join(PorterStemmer().stem_word(word) for word in text.split(" ") if word not in words)

去除给定的话很简单：

在

filtered_words = [word for word in unfiltered_words if not in set_of_words_to_filter]

来源

2012-05-08 20:22:50

我需要做另一件事。从该字符串中删除停用词。 stopwordsfile = open（'c：/stopwordlist.txt'，'r +'） stopwordslist = stopwordsfile.read（）我需要从文本中删除那些停用词'并得到清理新的字符串。 – ChamingaD

@ChamingaD我会建议这是一个不同的问题，你应该打开一个新的问题。如果你这样做，将来对于其他人来说会有更多的帮助，并且会更容易与我们合作。 –

问题是我必须再等20分钟才能开始新的问题：/ – ChamingaD

要经过的每个单词在字符串中

for word in text.split(): 
    PorterStemmer().stem_word(word)

使用字符串的连接方法（通过Lattyware推荐）来连接件一个大的字符串。

" ".join(PorterStemmer().stem_word(word) for word in text.split(" "))

来源

2012-05-08 20:12:37

这个问题确实询问'并得到一个阻止句子'答案将是'“.join（PorterStemmer（）。stem_word（word）for text.split（”“））''。 –

如何遍历Python中的字符串的句子？

回答

相关问题