如何从字符串列表中删除单词列表

对不起，如果问题有点混乱。这与this question 如何从字符串列表中删除单词列表

我认为这个问题与我想要的接近，但在Clojure中。

我需要这样的事，但不是“[BR]”在这个问题，有需要进行搜索和删除字符串列表。

希望我说清楚了。

我认为这是由于Python中的字符串是不可变的。

我有一个需要从字符串列表中删除的噪音词汇列表。

如果我使用列表理解，我最终一次又一次地搜索相同的字符串。所以，只有“的”被删除，而不是“该”。所以我的修改列表看起来像这样

places = ['New York', 'the New York City', 'at Moscow' and many more] 

noise_words_list = ['of', 'the', 'in', 'for', 'at'] 

for place in places: 
    stuff = [place.replace(w, "").strip() for w in noise_words_list if place.startswith(w)]

我想知道我在做什么错误。

来源

2010-08-18 prabhu

什么是'place'？ – katrielalex 2010-08-18 09:58:27

你没有让自己清楚;如果您认为以下必要，请在此处陈述您的问题*，然后将类似问题的链接放在相似的答案中。 – 2010-08-18 10:36:15

这是我的刺伤。这使用正则表达式。

import re 
pattern = re.compile("(of|the|in|for|at)\W", re.I) 
phrases = ['of New York', 'of the New York'] 
map(lambda phrase: pattern.sub("", phrase), phrases) # ['New York', 'New York']

三世lambda：

[pattern.sub("", phrase) for phrase in phrases]

更新

修复了该bug所指出的gnibbler（谢谢！）：

pattern = re.compile("\\b(of|the|in|for|at)\\W", re.I) 
phrases = ['of New York', 'of the New York', 'Spain has rain'] 
[pattern.sub("", phrase) for phrase in phrases] # ['New York', 'New York', 'Spain has rain']

@prabhu：上述变化避免删除尾随的“in“from”Spain“。要验证两个版本的正则表达式是否符合“西班牙有雨”这个短语。

来源

2010-08-18 09:58:58

谢谢。它以这种方式工作。现在我有机会实现这一点，我能够更清楚地理解lambda的概念。 – prabhu 2010-08-18 10:17:29

对于“西班牙有雨”这个短语，这不起作用。这很容易修复，虽然 – 2010-08-18 10:29:23

@Gnibbler：谢谢你指出。我相应地改变了我的答案。 – 2010-08-18 10:47:18

>>> import re 
>>> noise_words_list = ['of', 'the', 'in', 'for', 'at'] 
>>> phrases = ['of New York', 'of the New York'] 
>>> noise_re = re.compile('\\b(%s)\\W'%('|'.join(map(re.escape,noise_words_list))),re.I) 
>>> [noise_re.sub('',p) for p in phrases] 
['New York', 'New York']

来源

2010-08-18 10:04:41

哇！尽管我精神紧张，但这是一种非常酷的做法。 :-) – prabhu 2010-08-18 10:21:30

这似乎没有得到任何单词的实例。例如，“纽约的”成为“纽约的”。 – Namey 2014-05-05 00:38:41

@Namey，你可以使用类似''\\ W？\\ b（％s）\\ W？''的东西。如果没有OP提供了一套全面的测试用例，那么这是一个尴尬的问题 – 2014-05-05 01:12:32

既然你想知道你在做什么错，这条线：

stuff = [place.replace(w, "").strip() for w in noise_words_list if place.startswith(w)]

发生，然后开始遍历的话。首先它检查“的”。你的位置（例如“纽约的”）被检查以查看它是否以“of”开头。它被转换（调用替换和剥离）并添加到结果列表中。这里至关重要的是结果不再被检查。对于在理解中迭代的每个单词，都会将新结果添加到结果列表中。所以下一个单词是“the”，你的位置（“纽约”）不是以“the”开始，所以不会添加新的结果。

我假设你最终得到的结果是你的地点变量的连接。一个简单的阅读和理解程序的版本将是（未经测试）：

results = [] 
for place in places: 
    for word in words: 
     if place.startswith(word): 
      place = place.replace(word, "").strip() 
    results.append(place)

记住replace()将随时随地删除字符串中的单词，即使它发生是由于一个简单的字符串。你可以通过使用类似于^the\b这样的模式的正则表达式来避免这种情况。

来源

2010-08-18 10:13:00 wds

谢谢。这非常有帮助。 – prabhu 2010-08-18 10:16:18

没有正则表达式，你可以这样做：

places = ['of New York', 'of the New York'] 

noise_words_set = {'of', 'the', 'at', 'for', 'in'} 
stuff = [' '.join(w for w in place.split() if w.lower() not in noise_words_set) 
     for place in places 
     ] 
print stuff

来源

2010-08-18 11:25:18

优秀！谢谢！ – prabhu 2010-08-19 11:47:51

我碰到过这个，不知道这里发生了什么。如果有人绊倒这一点，并想知道发生了什么魔术，它的被称为列表理解，这是一个很好的文章解释它http://carlgroner.me/Python/2011/11/09/An-Introduction-to-List-Comprehensions-在-Python.html – 2017-07-26 10:53:33

如何从字符串列表中删除单词列表

回答

相关问题