批量重命名文件和文件夹是一个经常被问到的问题,但经过一番搜索之后,我认为没有一个类似于我的。使用“索引”重命名批量(基本名称)文件/文件夹
背景:我们派一些生物样品返回具有独特名称的文件和文本格式包含表服务供应商,其中包括信息,文件名和源自它的样本:
head samples.txt
fq_file Sample_ID Sample_name Library_ID FC_Number Track_Lanes_Pos
L2369_Track-3885_R1.fastq.gz S1746_B_7_t B 7 t L2369_B_7_t 163 6
L2349_Track-3865_R1.fastq.gz S1726_A_3_t A 3 t L2349_A_3_t 163 5
L2354_Track-3870_R1.fastq.gz S1731_A_GFP_c A GFP c L2354_A_GFP_c 163 5
L2377_Track-3893_R1.fastq.gz S1754_B_7_c B 7 c L2377_B_7_c 163 7
L2362_Track-3878_R1.fastq.gz S1739_B_GFP_t B GFP t L2362_B_GFP_t 163 6
目录结构(34个目录):
L2369_Track-3885_
accepted_hits.bam
deletions.bed
junctions.bed
logs
accepted_hits.bam.bai
insertions.bed
left_kept_reads.info
L2349_Track-3865_
accepted_hits.bam
deletions.bed
junctions.bed
logs
accepted_hits.bam.bai
insertions.bed
left_kept_reads.info
目标:因为文件名是毫无意义的,很难解释,我要重命名.bam结束(保持后缀)的文件和文件夹与通信样品名称,以更合适的方式重新排序。结果应该是这样的:
7_t_B
7_t_B..bam
deletions.bed
junctions.bed
logs
7_t_B.bam.bai
insertions.bed
left_kept_reads.info
3_t_A
3_t_A.bam
deletions.bed
junctions.bed
logs
accepted_hits.bam.bai
insertions.bed
left_kept_reads.info
我砍死在一起使用bash和python(新手)的解决方案,但感觉过度设计。问题是,是否有更简单/更优雅的方式来实现这一点,我错过了?解决方案可以使用python,bash和R.也可以awk,因为我正在尝试学习它。作为一个相对的初学者确实会让事情变得复杂。
这是我的解决方案:
的包装纸把它全部到位,并给出了工作流程的一个想法:
#! /bin/bash
# select columns of interest and write them to a file - basenames
tail -n +2 samples.txt | cut -d$'\t' -f1,3 >> BAMfilames.txt
# call my little python script that creates a new .sh with the renaming commmands
./renameBamFiles.py
# finally do the renaming
./renameBam.sh
# and the folders to
./renameBamFolder.sh
renameBamFiles.py:
#! /usr/bin/env python
import re
# Read in the data sample file and create a bash file that will remane the tophat output
# the reanaming will be as follows:
# mv L2377_Track-3893_R1_ L2377_Track-3893_R1_SRSF7_cyto_B
#
# Set the input file name
# (The program must be run from within the directory
# that contains this data file)
InFileName = 'BAMfilames.txt'
### Rename BAM files
# Open the input file for reading
InFile = open(InFileName, 'r')
# Open the output file for writing
OutFileName= 'renameBam.sh'
OutFile=open(OutFileName,'a') # You can append instead with 'a'
OutFile.write("#! /bin/bash"+"\n")
OutFile.write(" "+"\n")
# Loop through each line in the file
for Line in InFile:
## Remove the line ending characters
Line=Line.strip('\n')
## Separate the line into a list of its tab-delimited components
ElementList=Line.split('\t')
# separate the folder string from the experimental name
fileroot=ElementList[1]
fileroot=fileroot.split()
# create variable names using regex
folderName=re.sub(r'^(.*)(\_)(\w+).*', r'\1\2\3\2', ElementList[0])
folderName=folderName.strip('\n')
fileName = "%s_%s_%s" % (fileroot[1], fileroot[2], fileroot[0])
command= "for file in %s/accepted_hits.*; do mv $file ${file/accepted_hits/%s}; done" % (folderName, fileName)
print command
OutFile.write(command+"\n")
# After the loop is completed, close the files
InFile.close()
OutFile.close()
### Rename folders
# Open the input file for reading
InFile = open(InFileName, 'r')
# Open the output file for writing
OutFileName= 'renameBamFolder.sh'
OutFile=open(OutFileName,'w')
OutFile.write("#! /bin/bash"+"\n")
OutFile.write(" "+"\n")
# Loop through each line in the file
for Line in InFile:
## Remove the line ending characters
Line=Line.strip('\n')
## Separate the line into a list of its tab-delimited components
ElementList=Line.split('\t')
# separate the folder string from the experimental name
fileroot=ElementList[1]
fileroot=fileroot.split()
# create variable names using regex
folderName=re.sub(r'^(.*)(\_)(\w+).*', r'\1\2\3\2', ElementList[0])
folderName=folderName.strip('\n')
fileName = "%s_%s_%s" % (fileroot[1], fileroot[2], fileroot[0])
command= "mv %s %s" % (folderName, fileName)
print command
OutFile.write(command+"\n")
# After the loop is completed, close the files
InFile.close()
OutFile.close()
RenameBam.sh - 由以前的python脚本创建:
#! /bin/bash
for file in L2369_Track-3885_R1_/accepted_hits.*; do mv $file ${file/accepted_hits/7_t_B}; done
for file in L2349_Track-3865_R1_/accepted_hits.*; do mv $file ${file/accepted_hits/3_t_A}; done
for file in L2354_Track-3870_R1_/accepted_hits.*; do mv $file ${file/accepted_hits/GFP_c_A}; done
(..)
重命名renameBamFolder.sh非常相似:
mv L2369_Track-3885_R1_ 7_t_B
mv L2349_Track-3865_R1_ 3_t_A
mv L2354_Track-3870_R1_ GFP_c_A
mv L2377_Track-3893_R1_ 7_c_B
自从我学习,我觉得的这样做,并思考如何做到这一点的不同方法的一些示例,将是非常有用的。
使用Python生成bash似乎有点没有意义。我会说选择一种语言或其他语言,然后使用它。如果你不习惯,Python也许不那么神秘。 – 2013-02-20 13:11:30