2017-01-25 36 views
0

我输入文件,查找双向匹配的ID

ID1 ID2 value 
ID3 ID6 value 
ID2 ID1 value 
ID4 ID5 value 
ID6 ID5 value 
ID5 ID4 value 
ID7 ID2 value 

所需的输出,FILE1.TXT

ID1 ID2 value ID2 ID1 value 
ID4 ID5 value ID5 ID4 value 

FILE2.TXT

ID3 ID6 value 
ID6 ID5 value 
ID7 ID2 value 

我试图让双dicrectional最佳匹配。如果ID1具有命中ID2,则ID2也具有击中ID1,打印在file1中,否则打印在file2中。我试图做的是创建一个输入文件的副本并创建一个字典。但是这会给出没有值的输出(10列)。如何修改它?

fileA = open("input.txt",'r') 
fileB = open("input_copy.txt",'r') 
output = open("out.txt",'w') 

dictA = dict() 
for line1 in fileA: 
    new_list=line1.rstrip('\n').split('\t') 
    query=new_list[0] 
    subject=new_list[1] 
    dictA[query] = subject 
dictB = dict() 
for line1 in fileB: 
    new_list=line1.rstrip('\n').split('\t') 
    query=new_list[0] 
    subject=new_list[1] 
    dictB[query] = subject 
SharedPairs ={} 
NotSharedPairs ={} 
for id1 in dictA.keys(): 
    value1=dictA[id1] 
    if value1 in dictB.keys(): 
     if id1 == dictB[value1]: 
      SharedPairs[value1] = id1 
     else: 
      NotSharedPairs[value1] = id1 
for key in SharedPairs.keys(): 
    ine = key +'\t' + SharedPairs[key]+'\n' 
    output.write(line) 
for key in NotSharedPairs.keys(): 
    line = key +'\t' + NotSharedPairs[key]+'\n' 
    output2.write(line) 

回答

1

您可以使用set s到轻松解决它:

#!/usr/bin/env python 

# ordered pairs (ID1, ID2) 
oset = set() 
# reversed pairs (ID2, ID1) 
rset = set() 

with open('input.txt') as f: 
    for line in f: 
     first, second, val = line.strip().split() 
     if first < second: 
      oset.add((first, second, val,)) 
     else: 
      # note that this reverses second and first for matching purposes 
      rset.add((second, first, val,)) 

print "common: %s" % str(oset & rset) 
print "diff: %s" % str(oset^rset) 

输出:

common: set([('ID4', 'ID5', 'value'), ('ID1', 'ID2', 'value')]) 
diff: set([('ID3', 'ID6', 'value'), ('ID5', 'ID6', 'value'), ('ID2', 'ID7', 'value')]) 

它不处理对与(ID1, ID1)但你可以将其添加到第三组并做你所决定的。

+0

我已经过滤了这些ID。您可以将脚本修改为以标签格式保存到txt文件,bcos我拥有数千个ID。非常感谢 – user3224522

+0

使用'csv'模块。 –

1
import csv 
data = csv.reader(open('data.tsv'), delimiter='\t') 
id_list = [] 
for item in data: 
    (x, y, val) = item 
    id_list.append((x, y, val)) 

file1 = [item for item in id_list if (item[1], item[0], item[2]) in id_list] 
file2 = [item for item in id_list if (item[1], item[0], item[2]) not in id_list] 
print file1 
print file2 
+0

如果有很多项目会比列表做得更好... –