检索单个键下具有多个值的字典中的顶部值

我对Python有点新，我有一个问题。我有一个文件，每个唯一标识符有5个结果。每个结果都有一个百分比匹配以及其他各种数据。我的目标是找到匹配率最高的结果，然后从原始行中检索更多信息。例如检索单个键下具有多个值的字典中的顶部值

Name Organism Percent Match  Misc info 
1  Human  100    xxx  
1  Goat   95    yyy 
1  Pig   90    zzz

我试图通过把每个键在字典的值是（为每个键即多个值），每百分数比赛唯一给定的名称来解决这个问题。我认为继续进行的唯一方法是将此字典中的值转换为列表，然后对列表进行排序。然后，我想要检索列表中最大的值（列表[0]或列表[-1]），然后从原始行中检索更多信息。这里是我的代码迄今

list = [] 
if "1" in line: 
    id = line 
    bsp = id.split("\t") 
    uid = bsp[0] 
    per = bsp[2] 

    if not dict.has_key(uid): 
     dict[uid] = [] 
    dict[uid].append(per) 
    list = dict[uid] 
    list.sort() 
if list[0] in dict: 
    print key

这最终只是打印每一个关键，而不是只具有最大百分比。有什么想法吗？谢谢！

来源

2012-02-15 Vince

注意：上面的示例文件，每个“1”是该项目的名称，并且100,95和90是百分比匹配等。 – Vince 2012-02-15 23:01:03

>。<不使用列表作为一个变量名或者dict。 – 2012-02-15 23:04:38

你是否在循环中执行给定的代码？在这种情况下，它会在读取整个文件之前打印密钥。而且，在排序列表后，'my_list [0]'将是最小的项目，而不是最大的项目。 – 2012-02-15 23:07:38

你应该能够做这样的事情：

lines = [] 
with open('data.txt') as file: 
    for line in file: 
     if line.startswith('1'): 
      lines.append(line.split()) 

best_match = max(lines, key=lambda k: int(k[2]))

读取文件后lines会是这个样子：

>>> pprint.pprint(lines) 
[['1', 'Human', '100', 'xxx'], 
['1', 'Goat', '95', 'yyy'], 
['1', 'Pig', '90', 'zzz']]

然后你想从lines，其中入境第三项的值为int最高，可以这样表示：

>>> max(lines, key=lambda k: int(k[2])) 
['1', 'Human', '100', 'xxx']

所以在这个best_match年底将与您感兴趣的线路上的数据列表

或者，如果你想获得真正棘手，你可以得到的线在一个（复杂）的步骤：

with open('data.txt') as file: 
    best_match = max((s.split() for s in file if s.startswith('1')), 
        key=lambda k: int(k[2]))

来源

2012-02-15 23:06:49

你可以使用csv解析制表符分隔的数据文件，（虽然数据您发布看起来是列隔开的数据！？）

由于在数据文件中的第一行给出场名称，一个DictReader很方便，所以你可以参考这些列通过人类可读的名字。

csv.DictReader返回可迭代的行（字符串）。如果您在使用Percent Match列作为key采取迭代的max，你可以找到最高的匹配百分比该行：

使用这个（制表符分隔）的数据为test.dat：

Name Organism Percent Match Misc info 
1 Human 100 xxx 
1 Goat 95 yyy 
1 Pig 90 zzz 
2 Mouse 95 yyy 
2 Moose 90 zzz 
2 Manatee 100 xxx

的程序

import csv 

maxrows = {} 
with open('test.dat', 'rb') as f: 
    for row in csv.DictReader(f, delimiter = '\t'): 
     name = row['Name'] 
     percent = int(row['Percent Match']) 
     if int(maxrows.get(name,row)['Percent Match']) <= percent: 
      maxrows[name] = row 

print(maxrows)

产生

{'1': {'info': None, 'Percent Match': '100', 'Misc': 'xxx', 'Organism': 'Human', 'Name': '1'}, '2': {'info': None, 'Percent Match': '100', 'Misc': 'xxx', 'Organism': 'Manatee', 'Name': '2'}}

来源

2012-02-15 23:12:57 unutbu

非常有趣，谢谢！但是，我认为主要是由于我的输入文件示例中有一个错误，如果有多个输入，此脚本只打印最大行而不打印最大行。这是我没有指定的错误。感谢您向我展示导入csv！ – Vince 2012-02-17 01:36:32

好的，我改变了一下，为每个名字收集一个最大行。 – unutbu 2012-02-17 02:15:00

with open('datafile.txt', 'r') as f: 
    lines = file.read().split('\n') 

matchDict = {} 

for line in lines: 
    if line[0] == '1': 
     uid, organism, percent, misc = line.split('\t') 
     matchDict[int(percent)] = (organism, uid, misc) 

highestMatch = max(matchDict.keys()) 

print('{0} is the highest match at {1} percent'.format(matchDict[highestMatch][0], highestMatch))

来源

2012-02-15 23:14:35

我想你可能会寻找类似：

from collections import defaultdict 

results = defaultdict(list) 
with open('data.txt') as f: 
    #next(f)  # you may need this so skip the header 
    for line in f: 
     splitted = line.split() 
     results[splitted[0]].append(splitted[1:]) 

maxs = {} 
for uid,data in results.items(): 
    maxs[uid] = max(data, key=lambda k: int(k[1]))

我已经testif像一个文件：

Name Organism Percent Match  Misc info 
1  Human  100    xxx  
1  Goat   95    yyy 
1  Pig   90    zzz 
2  Pig   85    zzz 
2  Goat   70    yyy

，结果是：

{'1': ['Human', '100', 'xxx'], '2': ['Pig', '85', 'zzz']}

来源

2012-02-15 23:16:44

+1击败我。 – retracile 2012-02-15 23:17:49

谢谢！这个网站的岩石。 – Vince 2012-02-17 01:28:13

@Vince：不客气！不要忘记回馈社区投票选出最佳答案和[接受]（http://meta.stackexchange.com/q/5234/177799）最有用的答案:) – 2012-02-17 09:56:42

检索单个键下具有多个值的字典中的顶部值

回答

相关问题