-1
正则表达式我有以下文件命名seq.fasta:文本(蟒蛇)的多块
>AAM15934.1| NtrX [Gluconacetobacter diazotrophicus]| NTRX1 | Response_reg - Sigma54_activat - HTH_8
MGHEILIVDDEPDIRLLVEGILRDEGYETRLAGDSDSAISAFRARRPSLVILDVWLQGSRLDGLGILQAI
QGEEPVVPTIMISGHGTIETAVAALQHGAYDFIEKPFQSDRLLLVVRRALEASRLARENAELRLRAGPEA
MLYGDSPVIAGVRNQIERVAPSGSRVLISGAAGAGKEVAARMIHARSPGPKAFIALNCATLAPGRFEEEL
FGIEGAPDGTGRRTGVLERAHGGTLLLDEVSDMPIETQGKIVRALQDQSFERVGGASRVKVDVRVLAATN
RDLQEAIAAGRFREDLYYRLAVVPLRVPSLRERREDIPGLARLFLRRAAENAGLPLRDLSGDAVAALQSY
DWPGNARELRNLMERLLIMMPGNGSDLIRAEMLPPSVGQGAPALLKFDPAADVMGLPLREARDLFETQYL
QAQLLRFGGNISRTAGFVGMERSALHRKLKQLGVTSEERGAG
>WP_002731145.1| NtrX [Phaeospirillum molischianum]| NTRX1 | Response_reg - Sigma54_activat - HTH_8
MAHDILIVDDEADIRVLIAGILEDEGHSTREAANADEALERIRARRPSLVIQDIWLQGSRLDGLGVLDEI
KREHPDVPVVMISGHGTIETAVQAIKQGAYDFIEKPFKADRLLLVVDRAIESARLKRENQELRVRSGSTG
DLVGISPALVQIRQTIERVAPTNSRVLITGPAGSGKEVAARMIHAHSRRTEGPFVVVNCAAMHPDRMEIE
LFGTEYGADGSTSPRKIGTFEQAHSGTLLLDEVADMPLETQGKIVRVLQDQTFERVGGGKRVEVDVRVIA
TTNRDLQSEMIAGHFREDLFYRLNVVPIRMPALRDGKEDIPLLARQFMQLAAQLAGVPPRPLGEDALAAL
QAYDWPGNVRQLRNAIDWLLIMAPGDWRDPVRADMLPSEIGAITPAVLRWEKSSEIMTLPLREARELFER
EYLLAQVNRFAGNISRTAAFVGMERSALHRKLKLLGINTDEKVR
>WP_002967695.1| NtrX [Brucella abortus]| NTRX1 | Response_reg - Sigma54_activat - HTH_8
MAADILVVDDEVDIRDLVAGILSDEGHETRTAFDADSALAAINDRAPRLVFLDIWLQGSRLDGLALLDEI
KKQHPELPVVMISGHGNIETAVSAIRRGAYDFIEKPFKADRLILVAERALETSKLKREVSDLRKRTGDQL
ELVGTSLAMNQLRQTIERVAPTNSRIMITGPSGAGKELVARTIHAQSSRANGPFVTVNAATITPERMEIE
LFGTEMDGGERKVGALEEAHGGILYLDEVADMPRETQNKILRVLVDQQFERVGGTKRVKVDVRIISSTAQ
NLEGMIAEGTFREDLFHRLSVVPVQVPALAARREDIPSLVEFFMKQIAEQAGIKPRKIGPDAMAVLQAHS
WPGNLRQLRNNVERLMILTRGDDPDELVTADLLPAEIGDTLPRAPTESDQHIMALPLREARERFEKEYLI
AQINRFGGNISRTAEFVGMERSALHRKLKSLGV
我想提出的每个字母块在列表中。 例子:
列出内容:
List[0] = MGHEILIVDDEPDIRLLVEGILRDEGYETRLAGDSDSAISAFRARRPSLVILDVWLQGSRLDGLGILQAI
QGEEPVVPTIMISGHGTIETAVAALQHGAYDFIEKPFQSDRLLLVVRRALEASRLARENAELRLRAGPEA
MLYGDSPVIAGVRNQIERVAPSGSRVLISGAAGAGKEVAARMIHARSPGPKAFIALNCATLAPGRFEEEL
FGIEGAPDGTGRRTGVLERAHGGTLLLDEVSDMPIETQGKIVRALQDQSFERVGGASRVKVDVRVLAATN
RDLQEAIAAGRFREDLYYRLAVVPLRVPSLRERREDIPGLARLFLRRAAENAGLPLRDLSGDAVAALQSY
DWPGNARELRNLMERLLIMMPGNGSDLIRAEMLPPSVGQGAPALLKFDPAADVMGLPLREARDLFETQYL
QAQLLRFGGNISRTAGFVGMERSALHRKLKQLGVTSEERGAG
List[1] = MAHDILIVDDEADIRVLIAGILEDEGHSTREAANADEALERIRARRPSLVIQDIWLQGSRLDGLGVLDEI
KREHPDVPVVMISGHGTIETAVQAIKQGAYDFIEKPFKADRLLLVVDRAIESARLKRENQELRVRSGSTG
DLVGISPALVQIRQTIERVAPTNSRVLITGPAGSGKEVAARMIHAHSRRTEGPFVVVNCAAMHPDRMEIE
LFGTEYGADGSTSPRKIGTFEQAHSGTLLLDEVADMPLETQGKIVRVLQDQTFERVGGGKRVEVDVRVIA
TTNRDLQSEMIAGHFREDLFYRLNVVPIRMPALRDGKEDIPLLARQFMQLAAQLAGVPPRPLGEDALAAL
QAYDWPGNVRQLRNAIDWLLIMAPGDWRDPVRADMLPSEIGAITPAVLRWEKSSEIMTLPLREARELFER
EYLLAQVNRFAGNISRTAAFVGMERSALHRKLKLLGINTDEKVR
List[2] = MAADILVVDDEVDIRDLVAGILSDEGHETRTAFDADSALAAINDRAPRLVFLDIWLQGSRLDGLALLDEI
KKQHPELPVVMISGHGNIETAVSAIRRGAYDFIEKPFKADRLILVAERALETSKLKREVSDLRKRTGDQL
ELVGTSLAMNQLRQTIERVAPTNSRIMITGPSGAGKELVARTIHAQSSRANGPFVTVNAATITPERMEIE
LFGTEMDGGERKVGALEEAHGGILYLDEVADMPRETQNKILRVLVDQQFERVGGTKRVKVDVRIISSTAQ
NLEGMIAEGTFREDLFHRLSVVPVQVPALAARREDIPSLVEFFMKQIAEQAGIKPRKIGPDAMAVLQAHS
WPGNLRQLRNNVERLMILTRGDDPDELVTADLLPAEIGDTLPRAPTESDQHIMALPLREARERFEKEYLI
AQINRFGGNISRTAEFVGMERSALHRKLKSLGV
但我挣扎分裂,并把它们在列表中,我的代码是这样的:
import re
myfile = open('seq.fasta', 'r').read()
regex = re.compile(r'^>([^\n\r]+)[\n\r]([A-Z\n\r]+)', re.MULTILINE)
matches = [m.groups() for m in regex.finditer(myfile)]
for m in matches:
onlySequences = (m[1])
print(onlySequences)
变量onlySequences
返回刚刚过去的一个字母块,我如何保留所有的人,每个人都在一个列表中?
您每次迭代'matches'时都要重写'onlySequences'。 –
你不需要regex来做到这一点。 –