上的绳子

我需要找到一个更快的方法来找到一个8-11字符串的互换，以下列方式单一的交换：上的绳子

给定一个字符串'STDILGNLYE'，找到所有的字母一个字母互换：

list_AA = ['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 
      'F', 'P', 'S', 'T', 'W', 'Y', 'V']

即，对于字符串中的每个字母，替换原字符串中的每个字母有一个在list_aa。输出将是：

ATDILGNLYE 
RTDILGNLYE 
NTDILGNLYE 
... 
SADILGNLYE 
SRDILGNLYE 
SNDILGNLYE 
... 
... 
STDILGNLYV

对于总共200个新字符串（每个位置在字符串中每个位置20个交换）。我有什么至今：需要

def _create_swaps(original_str): 
    list_peps = [] 
    for i in range(len(original_str)): 
     for k in range(len(list_AA)): 
      list_peps.append(_insert_aa(original_str, i, list_aa[k])) 

    #remove original string 
    return [i for i in list_peps if i != original_str] 


def _insert_aa(string, index, aa): 
    list_string_elements = list(string) 
    del list_string_elements[index] 
    hash_string.insert(index, aa) 
    return "".join(hash_string)

因为这需要重复〜10 ** 6倍，这是一个大项目最慢的一步。有没有办法以更快的方式找到这样的交换（通过消除"".join，插入，步骤/通过找到交换）？

参考：

ncalls tottime percall cumtime percall filename:lineno(function) 
185275200 330.286 0.000 429.295 0.000 models.py:233(_insert_aa) 
975240  147.322 0.000 616.979 0.001 models.py:225(_create_swaps) 
185280201/185280197 59.137 0.000 59.138 0.000 {method 'join' of 'str' objects} 
185275208 39.875 0.000 39.875 0.000 {method 'insert' of 'list' objects} 
975240  21.027 0.000 21.027 0.000 models.py:231(<listcomp>) 
186746064 18.516 0.000 18.516 0.000 {method 'append' of 'list' objects}

来源

2016-12-02 Carlo Mazzaferro

你需要发出的所有生成的字符串，或者只是指望他们？ – Steve

@Steve我需要所有的字符串。正如你从'_create_swaps'的返回调用中看到的那样，它会返回除原始字符串之外的所有创建的字符串。 –

您可能想尝试找出一种方法，用'map（）'替换其中一个操作...参见[本文]（https://www.python.org/doc/essays/list2str/）循环效率...当然，性能总是比理论好，尽管... –

这尽管你已经选择了一个答案（它不是最pythonic），但它是你正在寻找的更清晰的版本。

你不应该使用范围来获得迭代的索引，如果你想对它进行pythonic，你应该使用枚举。

>>> def swaps(s, lst): 
... for index, _ in enumerate(s): 
...  for letter in lst: 
...  temp = list(s) 
...  temp[index] = letter 
...  yield ''.join(temp) 
... 
>>> list_AA = ['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V'] 
>>> s = 'STDILGNLYE' 
>>> 
>>> for _ in swaps(s, list_AA): 
... print _ 
... 
ATDILGNLYE 
RTDILGNLYE 
NTDILGNLYE 
.......... 
GTDILGNLYE 
HTDILGNLYE 
ITDILGNLYE

此外，在python3一个简单的方法：

>>> def swaps(s, lst): 
... for i, _ in enumerate(s): 
...  yield from ['%s%s%s' % (s[:i], x, s[i+1:]) for x in lst] 
... 
>>> swaps(s,list_AA) 
<generator object swaps at 0x10c9205c8> 
>>> a=_ 
>>> next(a) 
'ATDILGNLYE' 
>>> next(a) 
'RTDILGNLYE' 
>>> next(a) 
'NTDILGNLYE' 
>>> next(a) 
'DTDILGNLYE'

编辑：牺牲速度的解决方案/可读性

def swap3(s, lst): 
    for i, _ in enumerate(s): 
     head, tail = s[:i], s[i+1:] 
     yield from ['%s%s%s'%(head,c,tail) for c in lst]

而且继承人台钳所有三个hmark测试：

s='STDILGNLYE' 
list_AA=['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 
     'P', 'S', 'T', 'W', 'Y', 'V'] 

# the correct sample size 
list_new = list_AA * (10**6 // len(list_AA)) 

def swaps0(string, replacements): 
    for i in range(len(string)): 
     head = string[:i] 
     tail = string[i+1:] 
     for letter in replacements: 
      yield head + letter + tail 

def swaps1(s, lst): 
    for i, _ in enumerate(s): 
    yield from ['%s%s%s' % (s[:i], x, s[i+1:]) for x in lst] 

def swaps2(s, lst): 
    for index, _ in enumerate(s): 
    for letter in lst: 
     temp = list(s) 
     temp[index] = letter 
     yield ''.join(temp) 

timeit [_ for _ in swaps0(s, list_new)] 
timeit [_ for _ in swaps1(s, list_new)] 
timeit [_ for _ in swaps2(s, list_new)] 


In [9]: timeit [_ for _ in swaps0(s, list_new)] 
1 loop, best of 3: 2.61 s per loop 
In [10]: timeit [_ for _ in swaps1(s, list_new)] 
1 loop, best of 3: 6.57 s per loop 
In [11]: timeit [_ for _ in swaps2(s, list_new)] 
1 loop, best of 3: 8.61 s per loop

它值得吗？我想说这取决于你期望这个样本规模增长多少，以及你运行代码的频率。

如果代码不会频繁运行（例如，每小时几百次）并且样本大小不会按指数规律增长（大约为10 50或10 100），那么我会说为了可读性去。

如果这将随着样本量的增加而经常进行计算，请进行性能分析。

最后，我们留下了一个折衷的解决方案结合了头/尾分裂列举：

def swap3(s, lst): 
    for i, _ in enumerate(s): 
     head, tail = s[:i], s[i+1:] 
     yield from ['%s%s%s'%(head,c,tail) for c in lst] 

In [16]: timeit [_ for _ in swap3(s, list_new)] 
1 loop, best of 3: 3.99 s per loop

来源

2016-12-02 18:27:30

我喜欢列举的想法。但是，切片和连接速度更快。 timeit变体= [在generate_all_variants v实现V（S，list_AA）] 10000环路，最好的3：在互换v实现V（S，list_AA）]每循环34.3微秒 timeit变体= 1000循环，最好每个回路3：271μs – Steve

@steve我用更简单的方法使用Python3更新了我的答案 –

另外，python的'zen'是简单易读的代码比具有微优化的丑陋代码更好。优化就是这样一个微观优化。您需要替换大量的字符，以使其显着更快。 –

这应该是更快：

def _insert_aa(string, index, aa): 
    return string[0:index] + aa + string[index+1:]

编辑：你只能一次切头尾和重用这样的：

def generate_all_variants(string, replacements): 
    for i in range(len(string)): 
     head = string[:i] 
     tail = string[i+1:] 
     for letter in replacements: 
      yield head + letter + tail 

for variant in generate_all_variants("abcd", ['1', '2', '3']): 
    print(variant)

来源

2016-12-02 17:58:21 Steve

'“”.join'总是更快，然后连接使用'+' –

您的编辑似乎是我正在寻找的解决方案。仍然，为什么坚持'+'而不是'“”.join'？ –

连接函数采用一个参数，通常是一个列表，但创建列表需要时间。 – Steve

回答

相关问题