2016-05-10 67 views
1

我对普通数据有疑问。我有下面的格式包括数据的三个文本文件:从3个文本文件和匹配行下的行输出匹配行

cli= 111 
    mon= 45 

    cli= 584 
    mon= 21 

    cli= 23 
    mon= 417 

现在我有以下程序whcih当我执行它,它给了我所有匹配的CLI。换句话说,它给了我在3个文本文件中出现的CLI。

with open ('/home/user/Desktop/text1.txt', 'r') as file1: 
    with open ('/home/user/Desktop/text2.txt', 'r') as file2: 
      with open ('/home/user/Desktop/text3.txt', 'r') as file3: 
        same = set(file1).intersection(file2).intersection(file3) 
same.discard('\n') 

with open ('/home/user/Desktop/common.txt', 'w') as file_out: 
    for line in same: 
      file_out.write(line) 

我的问题是,我也可以输出值(MON = 45)与CLI = 111?假设所有3个文本文件中都存在CLI = 111。我想要一个这样的结果:

cli= 111 
    mon= 45 
    mon= 98 
    mon= 32 

在此先感谢。 PS:以上示例数据仅为1个文本文件。假设有3个文本文件。谢谢!

+0

所以你想在每个cli出现在所有文件后的相应星期一? –

+0

@Padraic坎宁安确切! – starshine

+0

好的,好吧,用字典很容易,我会把东西扔在一起 –

回答

0

看来,你是你想要的数据扔掉以后访问。无需再次解析文件,您需要以某种方式捕获该数据,以免再次查看文件。一种方法来做到这一点(假设每个'cli'只有一个对应的'mon'每个文件)将与一个字典。

我已经提供了一个函数,用于提供一个字典,其中的密钥是'cli'数据,值是mon数据。从那里,你可以从Dictionary键中创建一个Set(),并以这种方式找到交集。从路口,你知道,返回的值必须在字典键,所以只需将它们拼接成“出来”字符串和写入,为您的出文件:你到了那里

def buildDict(f): 
     dic = {} 
     for i in range(0,len(f)): 
      if "cli" in f[i]: 
       dic[f[i]] = f[i+1] 
     return dic 

    with open ('1.txt', 'r') as file1: 
     f1_dic = buildDict(file1.readlines()) 
     with open ('2.txt', 'r') as file2: 
      f2_dic = buildDict(file2.readlines()) 
      with open ('3.txt', 'r') as file3: 
       f3_dic = buildDict(file3.readlines()) 
       same = set(f1_dic.keys()).intersection(f2_dic.keys()).intersection(f3_dic.keys()) 

    out = '' 
    for i in same: 
     out += i 
     out += f1_dic[i] 
     out += f2_dic[i] 
     out += f3_dic[i] 


    with open ('common.txt', 'w') as file_out: 
     file_out.write(out) 
0

你可以组一个字典是在所有文件中的数据拉后CLI的线路:

with open('text1.txt', 'r') as file1, open('text2.txt', 'r') as file2, open('text3.txt', 'r') as file3: 
    inter = set(file1).intersection(file2).intersection(file3) 

    # create a dict using lists as values to group the mons and remove empty lines 
    d = {k: [] for k in inter if k.strip()} 
    # don't need set anymore, dict lookups are also O(1) 
    del inter 
    # reset pointers 
    file1.seek(0), file2.seek(0), file3.seek(0) 

    # iterate over files again 
    for f in [file1, file2, file3]: 
     for line in f: 
      if line in d: 
       # pull next line if we get a match. 
       d[line].append(next(f)) 

然后只写字典内容:

with open('/home/user/Desktop/common.txt', 'w') as file_out: 
    for k,v in d.items(): 
     file_out.write(k) 
     for line in v: 
      file_out.write(line) 

如果你正在寻找一个特定的行,即以cli =开头,那么另一种方法是首先用file1数据构建字典,然后迭代余下的部分,当你去写时只写入其值/列表长度为== 3的数据:

with open('text1.txt', 'r') as file1, open('text2.txt', 'r') as file2, open(
     'text3.txt', 'r') as file3: 
    # create dict from inital file storing following line after cli-.. inside list as value 
    d = {k: [next(file1)] for k in file1 if k.starstwith("cli=")} 

    for f in [file2, file3]: 
     for line in f: 
      if line in d: 
       d[line].append(next(f)) 

with open('/home/user/Desktop/common.txt', 'w') as file_out: 
    for k, v in d.items(): 
     # if len is three we have one from each 
     if len(v) == 3: 
      file_out.write(k) 
      for line in v: 
       file_out.write(line) 

这将失败的唯一方法是,如果你有一个或多个文件,有一个重复的CLI = ...

0

有趣的黑客即时建立一套线路;但正如你所看到的那样,它有点太巧妙了,因为mon线与cli线分离。所以让我们尝试更仔细,这样不会发生这种情况读书:

import re 

def getfile(fname): 
    with open(fname) as file1: 
     text = file1.read() 
    records = text.split("\n\n") 
    return dict(re.search(r"cli= *(\d+)\nmon= *(\d+)", rec).groups() for rec in records) 

d1 = getfile('/home/user/Desktop/text1.txt') 
d2 = getfile('/home/user/Desktop/text2.txt') 
d3 = getfile('/home/user/Desktop/text3.txt') 
same = set(d1).intersection(d2).intersection(d3) 

print("cli="+same) 
print("mon="+d1[same]) 
print("mon="+d2[same]) 
print("mon="+d3[same]) 

我打开每个文件成cli值映射到mon值,因为他们在对的字典。然后我们交叉cli值并使用它们查找mon值。