0
该脚本从多个新闻网站抓取标题,并统计新闻标题中出现多少次单词。替换特定单词
我收到了像“to”,“for”和类似的词,我不打算用这个脚本抢夺。
我试着写一个str.translate(None,“to”)来删除这个单词,但它删除了“贪婪” - 抢走了华盛顿的一些部分,当我想删除它的时候就是“to” 。
import pprint
import feedparser
from collections import Counter
def feedGrabber(feed):
parsed = feedparser.parse(feed)
feed1 = []
feed1.append(parsed.entries[0].title)
feed1.append(parsed.entries[1].title)
feed1.append(parsed.entries[3].title)
feed1.append(parsed.entries[4].title)
feed1.append(parsed.entries[5].title)
feed1.append(parsed.entries[6].title)
feed1.append(parsed.entries[7].title)
feed1.append(parsed.entries[8].title)
feed1.append(parsed.entries[9].title)
feed1 = str(feed1)
feedsplit = feed1
feedsplit = feedsplit.translate(None, '\'')
feedsplit = feedsplit.translate(None, 'u')
feedsplit = feedsplit.translate(None, '[')
feedsplit = feedsplit.translate(None, ']')
feedsplit = str.lower(feedsplit)
feedsplit = str.split(feedsplit)
return(feedsplit)
reddit = feedGrabber("https://www.reddit.com/r/news/.rss")
cnn = feedGrabber('http://rss.cnn.com/rss/cnn_topstories.rss')
nyt = feedGrabber('http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml')
one = Counter(reddit)
two = Counter(cnn)
three = Counter(nyt)
pprint.pprint(one + two + three)
删除它们为什么不删除的话,如“”从'Counter'对象创建?这比创建正则表达式更容易。另外,你可能想了解'for'循环。 – TigerhawkT3
您正在查找* stopword filtering * [见这篇文章](http://stackoverflow.com/questions/5486337/how-to-remove-stop-words-using-nltk-or-python) – rebeling