在Python中用分隔符分析行

我有我想分析的数据行。的数据是这样的：在Python中用分隔符分析行

a score=216 expect=1.05e-06 
a score=180 expect=0.0394

我想要做的是有一个子程序是解析它们和返回值2（得分和期望）为每一行。

但是我的这个功能似乎并没有工作：

def scoreEvalFromMaf(mafLines): 
    for word in mafLines[0]: 
     if word.startswith("score="): 
      theScore = word.split('=')[1] 
      theEval = word.split('=')[2] 
      return [theScore, theEval] 
    raise Exception("encountered an alignment without a score")

请指点什么是应该做的正确方法？

来源

2010-06-02 neversaint

顺便说一句，从来没有养'Exception'，因为这是不可能的三立抓住它随时增加更多的东西窄，像'ValueError'或者是你创建的。 – 2010-06-02 04:29:33

它看起来像你想分开每一个空格，分别解析每个块。如果mafLines是一个字符串从.readlines()（即一行：。

def scoreEvalFromMafLine(mafLine): 
    theScore, theEval = None, None 
    for word in mafLine.split(): 
     if word.startswith("score="): 
      theScore = word.split('=')[1] 
     if word.startswith("expect="): 
      theEval = word.split('=')[1] 

    if theScore is None or theEval is None: 
     raise Exception("Invalid line: '%s'" % line) 

    return (theScore, theEval)

你在做它会遍历在第一线的每个字符（因为它是一个字符串列表），而不是在每个空间的方式

来源

2010-06-02 01:39:45

@AB：嗨，托尼，谢谢。但是我也收到了同样的信息'“error：'list'object has no attribute'split'”'using the snippet。 – neversaint 2010-06-02 01:47:33

然后'mafLines'是列表列表，而不是字符串列表。我假设''mafLines'是从'.readlines（）'或类似的输出，但如果不是，你需要弄清楚它究竟是什么，或者你是如何生成的。 – 2010-06-02 02:07:32

我修正了它：'“为单词在mafLine [0]：”' – neversaint 2010-06-02 02:22:32

如果mafLines如果是一行行的列表，并且您只想看第一个行，则可以使用.split那行代码来获取单词。例如：

def scoreEvalFromMaf(mafLines): 
    theScore = None 
    theEval = None 
    for word in mafLines[0].split: 
     if word.startswith('score='): 
      _, theScore = word.partition('=') 
     elif word.startswith('expect='): 
      _, theEval = word.partition('=') 
    if theScore is None: 
     raise Exception("encountered an alignment without a score") 
    if theEVal is None: 
     raise Exception("encountered an alignment without an eval") 
    return theScore, theEval

注意，这会返回一个元组有两个串物品;如果你想要一个整数和浮点数，例如，你需要的最后一行改为

return int(theScore), float(theEval)

，然后你会得到一个ValueError异常，如果任一字符串是它应该代表类型无效，如果两个字符串都有效，则返回带有两个数字的元组。

来源

2010-06-02 01:31:59

@AM：嗨，亚历克斯，谢谢，但我明白了essage'“错误：'list'对象没有属性'split'”'。顺便说一句，这是存储函数输出的正确方法：'[score，exp] = scoreEvalFromMaf（maf）' – neversaint 2010-06-02 01:41:08

听起来像mafLines是列表列表而不是字符串列表。你是如何产生它的？你需要使用'.split（）'（即它是一个函数调用），并且也可以使用'word.split（'='）'而不是'单词。分区（'='）' – 2010-06-02 01:43:42

@neversaint，你肯定需要澄清那个神秘的'mafLines' **是**， - 可能是列表的列表，正如Anthony所说的（给出错误信息），但不知道你是如何构建它的，根本不可能“读出你的想法”，只凭空想象那些作品是神圣的。是的，一旦你澄清了这一点，你可以（如果你愿意）将这些无用的括号放在作业右边的'score，exp'。 – 2010-06-02 01:59:44

强制性的，可能不恰当的正则表达式的解决方案：

import re 
def scoreEvalFromMaf(mafLines): 
    return [re.search(r'score=(.+) expect=(.+)', line).groups() 
      for line in mafLines]

来源

2010-06-02 01:51:28 harto

这将爆炸无效输入（尽管这可能是你想要的行为）。将你的'（。+）'变成'（。*）'有助于捕获空白值，但是对于真正不友好的输入仍然会死。 – 2010-06-02 02:14:30

够正确。这只是一个替代战略的快速和肮脏的演示。 – harto 2010-06-02 03:58:06

在Python中用分隔符分析行

回答

相关问题