2016-12-07 99 views
-1

我在python中有一个基本问题,那就是我试图长时间找到解决方案,但是我无法获得正确的输出。根据python中的特殊字符将动态列表拆分为子列表

textvalues=[['1 of 2 DOCUMENTS', 'The New York Times', 'March 17, 2016 Thursday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section A; Column 0; Classified; Pg. 19', 'LENGTH: 176 words', 'LOAD-DATE: March 17, 2016', 'Copyright 2016 The New York Times Company', '', '2 of 2 DOCUMENTS', 'The New York Times', 'March 16, 2016 Wednesday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section B; Column 0; Classified; Pg. 16', 'LENGTH: 176 words', 'LOAD-DATE: March 16, 2016', 'Copyright 2016 The New York Times Company']] 

这里我需要根据“特殊字符”将上面的列表拆分成子列表。上面的列表是样本列表,主列表是动态的,列表的长度可能不同。在任何情况下,列表都需要用“'字符分隔。

解决方案,我曾尝试:

MainText = str(textvalues) 
split_index = MainText.index('',) 
l2 = MainText[:split_index] 
print(l2) 

预期的解决方案:

[['1 of 2 DOCUMENTS', 'The New York Times', 'March 17, 2016 Thursday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section A; Column 0; Classified; Pg. 19', 'LENGTH: 176 words', 'LOAD-DATE: March 17, 2016', 'Copyright 2016 The New York Times Company'] ,['2 of 2 DOCUMENTS', 'The New York Times', 'March 16, 2016 Wednesday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section B; Column 0; Classified; Pg. 16', 'LENGTH: 176 words', 'LOAD-DATE: March 16, 2016', 'Copyright 2016 The New York Times Company']] 

请帮我解决这个问题。由于

+0

检查右腿的解决方案。它适用于一些修改。在他的回答的评论中看到我的代码。 – MYGz

+0

检查我的解决方案,如果它适合你。 – MYGz

回答

1
import itertools 

textvalues=[['1 of 2 DOCUMENTS', 'The New York Times', 'March 17, 2016 Thursday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section A; Column 0; Classified; Pg. 19', 'LENGTH: 176 words', 'LOAD-DATE: March 17, 2016', 'Copyright 2016 The New York Times Company', '', '2 of 2 DOCUMENTS', 'The New York Times', 'March 16, 2016 Wednesday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section B; Column 0; Classified; Pg. 16', 'LENGTH: 176 words', 'LOAD-DATE: March 16, 2016', 'Copyright 2016 The New York Times Company']] 
groups = [] 
for a,b in itertools.groupby(textvalues[0], lambda x: x is not ''): 
    if a: 
     groups.append(list(b)) 
print groups 

输出:

[['1 of 2 DOCUMENTS', 'The New York Times', 'March 17, 2016 Thursday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section A; Column 0; Classified; Pg. 19', 'LENGTH: 176 words', 'LOAD-DATE: March 17, 2016', 'Copyright 2016 The New York Times Company'], ['2 of 2 DOCUMENTS', 'The New York Times', 'March 16, 2016 Wednesday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section B; Column 0; Classified; Pg. 16', 'LENGTH: 176 words', 'LOAD-DATE: March 16, 2016', 'Copyright 2016 The New York Times Company']] 
+0

好的解决方案。非常棘手。感谢分享它。 –

0

基本上,你可以遍历的内容,存储在缓冲区中的子串,并转储缓冲区主列表跨越''分离器何时到来:

result = list() 
line = list() 
for element in textvalues[0]: 
    if element != '': 
     line.append(element) 
    else: 
     result.append(line) 
     line = list() 
+0

修复您的解决方案。检查并编辑你的答案。 'textvalues = [['asd','','asd d','','c as d','','asd f','','lskd']] result = [] line = [] 为元件在textvalues [0]: 如果元素= '': line.append(元件) 否则: result.append(线) 线= [] 否则: result.append(线) 打印结果' – MYGz

+0

上述代码的输出:'[['asd'],['asd d'],['c as d'],['asd f'],['lskd']]' – MYGz

+0

它引发错误,因为多个其他人在那里。 – Mho

0
textvalues=[['1 of 2 DOCUMENTS', 'The New York Times', 'March 17, 2016 Thursday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section A; Column 0; Classified; Pg. 19', 'LENGTH: 176 words', 'LOAD-DATE: March 17, 2016', 'Copyright 2016 The New York Times Company', '', '2 of 2 DOCUMENTS', 'The New York Times', 'March 16, 2016 Wednesday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section B; Column 0; Classified; Pg. 16', 'LENGTH: 176 words', 'LOAD-DATE: March 16, 2016', 'Copyright 2016 The New York Times Company']] 

textvalues2 = [] 

for i in ','.join(i for i in textvalues[0]).split(',,') : 
    textvalues2.append(i.split(',')) 
相关问题