2017-02-03 65 views
1

我想重命名使用python的目录中的一组文件。这些文件当前标有池编号,AR编号和S编号(例如Pool1_AR001_S13__fw_paired.fastq.gz)。每个文件都指特定的工厂序列名称。我想通过删除“Pool_AR_S”来重命名这些文件,并用序列名称替换它,例如'Lbienne_dor5_GS1',同时留下后缀(例如fw_paired.fastq.gz,rv_unpaired.fastq.gz),我试图将这些文件读入字典中,但我坚持下一步该做什么。我有一个包含按以下格式的必要信息的.txt文件:使用python替换目录中的文件名的一部分

Pool1_AR010_S17 - Lbienne_lla10_GS2 
Pool1_AR011_S18 - Lbienne_lla10_GS3 
Pool1_AR020_S19 - Lcampanulatum_borau4_T_GS1 

我到目前为止的代码是:

from optparse import OptionParser 
import csv 
import os 

parser = OptionParser() 
parser.add_option("-w", "--wanted", dest="w") 
parser.add_option("-t","--trimmed", dest="t") 
parser.add_option("-d", "--directory", dest="working_dir", default="./") 
(options, args) = parser.parse_args() 

wanted_file = options.w 
trimmomatic_output = options.t 

#Read the wanted file and create a dictionary of index vs species identity 

with open(wanted_file, 'rb') as species_sequence: 
    species_list = list(csv.DictReader(species_sequence, delimiter='-')) 
    print species_list 


#Rename the Trimmomatic Output files according to the dictionary 


for trimmed_sequence in os.listdir(trimmomatic_output): 
os.rename(os.path.join(trimmomatic_output, trimmed_sequence), 
      os.path.join(trimmomatic_output, trimmed_sequence.replace(species_list[0], species_list[1])) 

请你能帮我更换一半。我对python和堆栈溢出很陌生,所以我很抱歉如果之前有人问过这个问题,或者我在错误的地方问过这个问题。

回答

1

第一份工作是摆脱所有这些模块。他们可能会很好,但对于像你这样的工作,他们不太可能让事情变得更轻松。

在这些.gz文件所在的目录中创建一个.py文件。

import os 
files = os.listdir() #files is of list type 
#'txt_file' is the path of your .txt file containing those conversions 
dic=parse_txt(txt_file) #omitted the body of parse_txt() func.Should return a dictionary by parsing that .txt file 
for f in files: 
    pre,suf=f.split('__') #"Pool1_AR001_S13__(1)fw_paired.fastq.gz" 
          #(1)=assuming prefix and suffix are divided by double underscore 
    pre = dic[pre] 
    os.rename(f,pre+'__'+suf) 

如果您需要关于parse_txt()函数的帮助,请告诉我。

0

这是我用Python 2测试的一个解决方案。它很好,如果你使用自己的逻辑而不是get_mappings函数。请参阅代码中的注释以解释。



    import os 

    def get_mappings(): 
     mappings_dict = {} 
     with(open('wanted_file.txt', 'r')) as f: 
      for line in f: 
       # if you have Pool1_AR010_S17 - Lbienne_lla10_GS2 
       # it becomes a list i.e ['Pool1_AR010_S17 ', ' Lbienne_lla10_GS2'] 
       #note that there may be spaces before/after the names as shown above 
       text = line.split('-') 
       #trim is used to remove spaces in the names 
       mappings_dict[text[0].strip()] = text[1].strip() 

     return mappings_dict 

    #PROGRAM EXECUTION STARTS FROM HERE 
    #assuming all files are in the current directory 
    # if not replace the dot(.) with the path of the directory where you have the files 
    files = os.listdir('.') 
    wanted_names_dict = get_mappings() 
    for filename in files: 
     try: 
      #prefix='Pool1_AR010_S17', suffix='fw_paired.fastq.gz' 
      prefix, suffix = filename.split('__') 
      new_filename = wanted_names_dict[prefix] + '__' + suffix 
      os.rename(filename, new_filename) 
      print 'renamed', filename, 'to', new_filename 
     except: 
      print 'No new name defined for file:' + filename 

相关问题