2016-12-01 27 views
-1

我必须创建一个程序,它读取代码行,直到单个“。”。被输入,我必须删除标点符号,全部更改为小写字母,删除停用词和后缀。除了能够删除后缀外,我已经管理了所有这些,我试过.strip,因为您可以看到它,但它只接受一个参数,并且实际上并未从列表元素中删除后缀。任何建议/指针/帮助?由于删除python中列表元素的后缀

stopWords = [ "a", "i", "it", "am", "at", "on", "in", "to", "too", "very", \ 
      "of", "from", "here", "even", "the", "but", "and", "is", "my", \ 
      "them", "then", "this", "that", "than", "though", "so", "are" ] 

noStemWords = [ "feed", "sages", "yearling", "mass", "make", "sly", "ring" ] 


# -------- Replace with your code - e.g. delete line, add your code here ------------ 

Text = raw_input("Indexer: Type in lines, that finish with a . at start of line only: ").lower() 
while Text != ".": 
    LineNo = 0 
    x=0 
    y=0 
    i= 0 

#creates new string, cycles through strint Text and removes puctutaiton 
    PuncRemover = "" 
    for c in Text: 
     if c in ".,:;!?&'": 
      c="" 
     PuncRemover += c 

    SplitWords = PuncRemover.split() 

#loops through SplitWords list, removes value at x if found in StopWords list 
    while x < len(SplitWords)-1: 
     if SplitWords[x] in stopWords: 
      del SplitWords[x] 
     else: 
      x=x+1 

    while y < len(SplitWords)-1: 
     if SplitWords[y] in noStemWords: 
      y=y+1 
     else: 
      SplitWords[y].strip("ed") 
      y=y+1 

    Text = raw_input().lower() 

print "lines with stopwords removed:" + str(SplitWords) 
print Text 
print LineNo 
print x 
print y 
print PuncRemover 
+0

您正在阅读的只是曾经在这里,看看'raw_input'约 – martianwars

+1

有两件事情代码风格第一。你应该看看[Python命名约定](https://www.python.org/dev/peps/pep-0008/#naming-conventions)。大写的单词通常保留给类或类型变量。此外,你的'while'循环应该是'for'循环,因为你知道你要执行多少次迭代。至于你的问题,你需要实际分配正在改变的列表元素。对于剥离字符序列,请参阅[这个问题](http://stackoverflow.com/questions/3900054/python-strip-multiple-characters) – danielunderwood

+0

读入行是为了添加到字典,这是为什么现在它只能读取一次。 – Rydooo

回答

0

下面的函数应该从任何特定的字符串中删除后缀。

from itertools import groupby 


def removeSuffixs(sentence): 

    suffixList = ["ing", "ation"] #add more as nessecary 

    for item in suffixList: 
     if item in sentence: 

      sentence = sentence.replace(item, "") 
      repeatLetters = next((True for char, group in groupby(sentence) 
            if sum(1 for _ in group) >= 2), False) 

      if repeatLetters: 

       sentence = sentence[:-1] 

    return sentence 

例子:

print(removeSuffixs("climbing running")) # 'climb run' 
print(removeSuffixs("summation")) # 'sum' 

在代码中,替换SplitWords[y].strip("ed") 用,

SplitWords[y] = removeSuffixs(SplitWords[y])