有没有让这种逻辑更优雅的Pythonic方法？

我是Python新手，为了简单的任务我一直在玩它。我有一堆需要以复杂方式操作的CSV，但为了学习Python，我将其分解为更小的任务。有没有让这种逻辑更优雅的Pythonic方法？

现在，给定一个字符串列表，我想删除字符串中任何名字的用户定义的标题前缀。任何包含名称的字符串将只包含名称，有或没有标题前缀。我有以下几点，它的工作原理，但它感觉不必要的复杂。是否有更多的Pythonic方法来做到这一点？谢谢！

# Return new list without title prefixes for strings in a list of strings. 
def strip_titles(line, title_prefixes): 
    new_csv_line = [] 
    for item in line: 
     for title_prefix in title_prefixes: 
      if item.startswith(title_prefix): 
       new_csv_line.append(item[len(title_prefix)+1:]) 
       break 
      else: 
       if title_prefix == title_prefixes[len(title_prefixes)-1]: 
        new_csv_line.append(item) 
       else: 
        continue 
    return new_csv_line 

if __name__ == "__main__": 
    test_csv_line = ['Mr. Richard Stallman', 'I like cake', 'Mrs. Margaret Thatcher', 'Jean-Claude Van Damme'] 
    test_prefixes = ['Mr.', 'Ms.', 'Mrs.'] 
    print strip_titles(test_csv_line, test_prefixes)

来源

2010-09-24 bsamek

“Jane Doe女士”和“Betty Bloggs夫人”以及“Fred Nerk先生”，还有很多书呆子[缩写词，缩略词]和按键节俭的民谣和女士憎恶的民谣。 “和”希尔德加德希格斯小姐“？ – 2010-09-24 02:12:21

@John谢天谢地，这不是一个问题，因为数据来自另一个来源，并为此提供了一致的方案。 – bsamek 2010-09-24 02:22:23

“一致的数据源”？我将在“Famous Last Words”下提交该文件:-) – 2010-09-24 03:13:31

[re.sub(r'^(Mr|Ms|Mrs)\.\s+', '', s) for s in test_csv_line]

来源

2010-09-24 02:04:34

哇。很酷。但是，它会在删除前缀时在名称前留下一个空格。 – bsamek 2010-09-24 02:07:28

我永远不会厌倦看到正则表达式的美丽。 – 2010-09-24 02:27:36

@paracaudex：你可能在评论时看到我的第一个版本。当前版本去掉前缀后的所有空格。 – 2010-09-24 02:37:48

假设prefixes是可变的，或许是本地化的一个方面，或者你不喜欢使用正则表达式其他一些原因，你可以做这样的事情（未测试的代码）：

def strip_title(string, prefixes): 
    for prefix in prefixes: 
     if string.startswith(prefix + ' '): 
      return string[len(prefix) + 1:] 
    return string 

stripped = (list(strip_title(cell, prefixes) for cell in line) 
      for line in lines)

这不是特别有效，因为算法最终会执行大量冗余检查（例如，如果行以M开头，则检查三次）。这种事情是使用正则表达式的一个重要原因。

或者，你可以动态地构建一个正则表达式，以逃避每个前缀和|分支机构加入他们：

def TitleStripper(prefixes): 
    import re 
    escaped_titles = (re.escape(prefix) for prefix in prefixes) 
    prefix_re = re.compile('^({0}) '.format('|'.join(escaped_titles))) 
    def strip_title(string): 
     return prefix_re.sub('', string, 1) 
    return strip_title

功能TitleStripper创建一个闭合功能strip_title工作方式类似于前一个，但是专为一组特定的前缀。拨打电话strip_title = TitleStripper(prefixes)后，您可以致电strip_title(string)。

主要是由于使用正则表达式，这会比第一种方法快一些，也许会以牺牲清晰度为代价。

如果你真的只需要检查三个前缀，这些方法中的任何一个都是矫枉过正的，你应该只使用一个静态RE，如另一个答案中所解释的。

来源

2010-09-24 02:16:22 intuited

为什么我需要转义每个前缀？ – bsamek 2010-09-24 02:26:23

例如，您需要转义'.'，即替代'\ .'，以便它不匹配任何字符。你可以用[re。逸出]（http://docs.python.org/library/re.html#re.escape）。 – intuited 2010-09-24 02:50:04

啊，我明白了。我以为你的意思是逃避整个事情 - 就像\先生。我没有意识到有一个逃生功能。 – bsamek 2010-09-24 03:10:34

更多Pythonic方法是用子句替换for item in line:循环的“列表结束”检查。

# Return new list without title prefixes for strings in a list of strings.  
def strip_titles(line, title_prefixes): 
    new_csv_line = [] 
    for item in line: 
     for title_prefix in title_prefixes: 
      if item.startswith(title_prefix): 
       new_csv_line.append(item[len(title_prefix)+1:]) 
       break 
     else: 
      new_csv_line.append(item) 
    return new_csv_line

逻辑是否则你的一样：在else如果for循环完成而没有被中断被执行。

来源

2010-09-24 02:24:04

有没有让这种逻辑更优雅的Pythonic方法？

回答

相关问题