字符串拆分问题

问题：通过以列表形式传入的分隔符将字符串拆分为单词列表。字符串拆分问题

字符串："After the flood ... all the colors came out."

所需的输出：['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

我写了下面的功能 - 注意，我知道有更好的方法使用一些内置的功能蟒蛇来分割字符串，但为求学习，我想我会继续这样说：

def split_string(source,splitlist): 
    result = [] 
    for e in source: 
      if e in splitlist: 
       end = source.find(e) 
       result.append(source[0:end]) 
       tmp = source[end+1:] 
       for f in tmp: 
        if f not in splitlist: 
         start = tmp.find(f) 
         break 
       source = tmp[start:] 
    return result 

out = split_string("After the flood ... all the colors came out.", " .") 

print out 

['After', 'the', 'flood', 'all', 'the', 'colors', 'came out', '', '', '', '', '', '', '', '', '']

我想不通为什么“出笼”不拆分为“来”和“走出去”作为两个单独的单词。就好像两个单词之间的空白字符被忽略一样。我认为其余的产出是垃圾，这是源于与“出来”问题相关的问题。

编辑：

我跟着@ IVC的建议，并用下面的代码上来：

def split_string(source,splitlist): 
    result = [] 
    lasti = -1 
    for i, e in enumerate(source): 
     if e in splitlist: 
      tmp = source[lasti+1:i] 
      if tmp not in splitlist: 
       result.append(tmp) 
      lasti = i 
     if e not in splitlist and i == len(source) - 1: 
      tmp = source[lasti+1:i+1] 
      result.append(tmp) 
    return result 

out = split_string("This is a test-of the,string separation-code!"," ,!-") 
print out 
#>>> ['This', 'is', 'a', 'test', 'of', 'the', 'string', 'separation', 'code'] 

out = split_string("After the flood ... all the colors came out.", " .") 
print out 
#>>> ['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out'] 

out = split_string("First Name,Last Name,Street Address,City,State,Zip Code",",") 
print out 
#>>>['First Name', 'Last Name', 'Street Address', 'City', 'State', 'Zip Code'] 

out = split_string(" After the flood ... all the colors came out...............", " ." 
print out 
#>>>['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

来源

2012-05-30 codingknob

你似乎在期待：

source = tmp[start:]

要修改for循环外被遍历source。它不会 - 那个循环会继续传递给你的字符串，而不是现在使用这个名字的任何对象。这可能意味着您要使用的角色可能不在source左侧。

，而不是试图做的是，跟踪当前指数的字符串是这样的：

for i, e in enumerate(source): 
    ...

，并要追加永远是source[lasti+1:i]什么，你只需要跟踪的lasti 。

来源

2012-05-30 02:59:40 lvc

谢谢大家的精彩解决方案。我已经走了这一条，因为它迫使我学习逻辑，而不是使用预先构建的函数。显然，如果我要写商业代码，我不会重新发明轮子，但为了学习的目的，我会与这个答案一起去。感谢大家的帮助。 – codingknob

你不需要内环电话。就在这就够了：

def split_string(source,splitlist): 
    result = [] 
    for e in source: 
      if e in splitlist: 
       end = source.find(e) 
       result.append(source[0:end]) 
       source = source[end+1:] 
    return result

可以消除“垃圾”（即空字符串），通过检查源[：结束]是一个空字符串，或者不是你把它添加到列表中。

来源

2012-05-30 02:49:38

为什么做太多的事情，如此简单，尝试..
str.split(strSplitter , intMaxSplitCount)intMaxSplitCount是可选
在你的情况，你必须做一些家务也一样，如果你想避免... 一个是你可以取代它，像str.replace(".","", 3)3是可选的，它将取代前3点只是

因此，在短期，你必须做以下，
print ((str.replace(".", "",3)).split(" ")) 将打印你想要的是什么

我做执行，Just Check Here,...

来源

2012-05-30 03:29:10

[x for x in a.replace('.', '').split(' ') if len(x)>0]

这里“一个”是你输入的字符串。

来源

2012-05-30 03:45:06 thavan

更简单的方法，至少看起来简单..

import string 

    def split_string(source, splitlist): 
     table = string.maketrans(splitlist, ' ' * len(splitlist)) 
     return string.translate(source, table).split()

您可以检出string.maketrans和string.translate

来源

2012-05-30 04:49:39 xvatar

我想，如果你使用正则表达式，你可以很容易地得到它，如果你只想在的话上面的字符串。

>>> import re 
>>> string="After the flood ... all the colors came out." 
>>> re.findall('\w+',string) 
['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

来源

2012-06-01 11:53:40

字符串拆分问题

回答

相关问题