串联字符串列表中的选定字符串

问题如下。我有一个字符串串联字符串列表中的选定字符串

lst1=['puffing','his','first','cigarette','in', 'weeks', 'in', 'weeks']

，我想获得字符串列表

lst2=['puffing','his','first','cigarette','in weeks', 'in weeks']

这是连接子列表['in', 'weeks']的任何occurence对于那些无关紧要的这里那里find_sub_list1摘自原因， here（和包括在下面的代码）：

npis = [['in', 'weeks'], ['in', 'ages']] 

# given a list a candidate sublist, return the index of the first and last 
# element of the sublist within the list 
def find_sub_list1(sl,l): 
    results=[] 
    sll=len(sl) 
    for ind in (i for i,e in enumerate(l) if e==sl[0]): 
     if l[ind:ind+sll]==sl: 
     results.append((ind,ind+sll-1)) 

    return results 

def concatenator(sent, npis): 
    indices = [] 
    for npi in npis: 
     indices_temp = find_sub_list1(npi, sent) 
     if indices_temp != []: 
      indices.extend(indices_temp) 
    sorted(indices, key=lambda x: x[0]) 

    for (a,b) in indices: 
     diff = b - a 
     sent[a:b+1] = [" ".join(sent[a:b+1])] 
     del indices[0] 
     indices = [(a - diff, b - diff) for (a,b) in indices] 

    return sent

所需lst2此编码器retur代替ns：

concatenator(lst1,['in', 'weeks']) 
>>['puffing','his','first','cigarette','in weeks', 'in', 'weeks']

所以它只连接第一次出现。关于代码失败的任何想法？

来源

2017-05-02 Orest Xherija

一个更好的方式来连接这两个词将向后工作。这样你就不会需要用'diff'来调整其余的索引。 – aydow

我怎么错过了！伟大的建议！非常感谢！ –

这不是你的代码修复，但另一种解决方案（我总是最后使用正则表达式的一切）

import re 
list1_str = ','.join(lst1) 
npis_concat = [','.join(x) for x in npis] 
for item in npis_concat: 
    list1_str = re.sub(r'\b'+item+r'\b',item.replace(',', ' '),list1_str) 
lst1 = list1_str.split(',')

我在这里使用一个逗号，但你可以使用任何字符替换它，最好一个你知道不会是在文本

的r'\b'被用来确保我们不小心从结束/与东西开始在非营利机构的话砍位

来源

2017-05-02 03:22:37 Nullman

，因为所需的子序列'in' 'weeks'并可能'in''ages'

一个可能的解决方案（该循环是不是很优雅，虽然）：

先找地方'in'发生的所有位置。
然后通过源列表迭代，附加元素目标列表，以及治疗的'in'位置特殊，也就是说，如果下面的字是在一个特殊的设置，则加入这两个&附加到目标，推进迭代器一个额外的时间。
源列表耗尽后，将引发IndexError，表明我们应该打破循环。

代码：

index_in = [i for i, _ in enumerate(lst1) if _ == 'in'] 

lst2 = []; n = 0 

while True: 
    try: 
     if n in index_in and lst1[n+1] in ['weeks', 'ages']: 
      lst2.append(lst1[n] + lst1[n+1]) 
      n += 1 
     else: 
      lst2.append(lst1[n]) 
     n += 1 
    except IndexError: 
     break

一个更好的办法来做到这一点是通过正则表达式。

加入列表与空间的字符串作为分隔符
各执空间名单，除由in<space>weeks包围的空间。在这里，我们可以使用负前瞻&回顾后

代码：

import re 

c = re.compile(r'(?<!in) (?!weeks)') 

lst2 = c.split(' '.join(lst1))

来源

2017-05-02 03:27:40

串联字符串列表中的选定字符串

回答

相关问题