最长的通用序列 - 语法错误

我试图找到两个DNA序列的LCS。我输出矩阵形式以及包含最长公共序列的字符串。但是，当我在代码中返回矩阵和列表时，我得到以下错误：IndexError：字符串索引超出范围最长的通用序列 - 语法错误

如果我要删除涉及变量temp和higestcount的编码，我的代码将很好地输出我的矩阵。我正在尝试使用类似的矩阵编码生成我的列表。有没有办法避免这个错误？基于序列AGCTGGTCAG和TACGCTGGTGGCAT，最长的共同序列应该是GCTGGT。

def lcs(x,y): 
    c = len(x) 
    d = len(y) 
    plot = [] 
    temp = '' 
    highestcount = '' 

    for i in range(c): 
     plot.append([]) 
     temp.join('') 
     for j in range(d): 
      if x[i] == y[j]: 
       plot[i].append(plot[i-1][j-1] + 1) 
       temp.join(temp[i-1][j-1]) 
      else: 
       plot[i].append(0) 
       temp = '' 
       if temp > highestcount: 
        highestcount = temp 

    return plot, temp 

x = "AGCTGGTCAG" 
y = "TACGCTGGTGGCAT" 
test = compute_lcs(x,y) 

print test

来源

2014-09-29 Roy

在temp.join(temp[i-1][j-1])温度的第一次迭代的变量是一个空字符串，''

有字符串中没有字符可按照索引被调用，从而临时[any_number]将引发index out of range异常。

来源

2014-09-29 01:19:16

但我开始了与什么在字符串中，因为我希望程序在进行for循环时填充最长的序列（当然，用已经保存的序列之前的序列替换最长的序列） – Roy 2014-09-29 01:25:17

Python将首先执行最内层的操作，温度[I-1] [j-1]。在那个操作的时候，temp ==''，它没有可以通过将索引号传递给列表来找到的字符串（调用'any_string [any_integer]'将any_string作为字符列表存储，即['H ”， 'E'， 'L'， 'L'， 'O']）。我没有遗传学背景，所以我不确定你想要加入到现有的临时字符串中，但是通过索引查找空字符串中的字符将永远不会工作。你能提供一个你想看到的输出的例子吗？ – 2014-09-29 01:33:20

看到这个页面http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring#Python – han058 2014-09-29 01:35:16

据我所知，join（）通过另一个字符串连接一个字符串数组。例如，"-".join(["a", "b", "c"])将返回a-b-c。

此外，您首先将temp定义为字符串，但稍后使用双索引引用它，就好像它是数组一样。据我所知，你可以用一个索引调用来引用字符串中的一个字符。例如，a = "foobar"，a[3]返回b。

我将您的代码更改为以下内容。初始化数组以开始以避免索引问题。

def lcs(x,y): 
    c = len(x) 
    d = len(y) 
    plot = [[0 for j in range(d+1)] for i in range(c+1)] 
    temp = [['' for j in range(d+1)] for i in range(c+1)] 
    highestcount = 0 
    longestWord = '' 

    for i in range(c): 
     for j in range(d): 
      if x[i] == y[j]: 
       plot[i+1][j+1] = plot[i][j] + 1 
       temp[i+1][j+1] = ''.join([temp[i][j],x[i]]) 
      else: 
       plot[i+1][j+1] = 0 
       temp[i+1][j+1] = '' 
       if plot[i][j] > highestcount: 
        highestcount = plot[i][j] 
        longestWord = temp[i][j] 

    return plot, temp, highestcount, longestWord 

x = "AGCTGGTCAG" 
y = "TACGCTGGTGGCAT" 
test = lcs(x,y) 
print test

来源

2014-09-29 01:46:04 timctran

在我看来，你正在经历一个不必要的复杂的画面，而这导致的混乱，包括其他人所说的空字符串。

例如，这仍然是相当详细，但我认为这是更容易跟踪（和返回预期的答案）：

def lcs(seq1, seq2): 
    matches = [] 
    for i in range(len(seq1)): 
     j = 1 
     while seq1[i:j] in seq2: 
      j+=1 
      if j > len(seq1): 
       break 
     matches.append((len(seq1[i:j-1]), seq1[i:j-1])) 
    return max(matches) 

seq1 = 'AGCTGGTCAG' 
seq2 = 'TACGCTGGTGGCAT' 
lcs(seq1, seq2)

回报

(6, 'GCTGGT')

来源

2014-09-29 02:08:12 iayork

这段代码很好，但是我不能输出我的矩阵。是否还有另一种方法使用我只为一个序列和j为另一个序列？ – Roy 2014-09-29 02:34:34

你能举一个你想要输出看起来像什么的例子吗？ – iayork 2014-09-29 11:08:25

最长的通用序列 - 语法错误

回答

相关问题