我有一个Excel文件这样的..多个单元格在Excel
Sr. No. GENE ID Gene Id (NCBI) Protein Id Protein Sequences
1 Lmo0001 984365 NP_463534.1
2 Lmo0002 984379 NP_463535.1
3 Lmo0003 984420 NP_463536.1
该列表扩展到3000个基因。我将这些序列保存在这样的文本板中,对于所有3000个基因,每个单独序列之间都有一个空格。
gi | 16802049 | ref | NP_463534.1 |染色体复制起始蛋白[李斯特菌EGD-E] MQSIEDIWQETLQIVKKNMSKPSYDTWMKSTTAHSLEGNTFIISAPNNFVRDWLEKSYTQFIANILQEIT GRLFDVRFIDGEQEENFEYTVIKPNPALDEDGIEIGKHMLNPRYVFDTFVIGSGNRFAHAASLAVAEAPA KAYNPLFIYGGVGLGKTHLMHAVGHYVQQHKDNAKVMYLSSEKFTNEFISSIRDNKTEEFRTKYRNVDVL LIDDIQFLAGKEGTQEEFFHTFNTLYDEQKQIIISSDRPPKEIPTLEDRLRSRFEWGLITDITPPDLETR IAILRKKAKADGLDIPNEVMLYIANQIDSNIRELEGALIRVVAYSSLVNKDITAGLAAEALKDIIPSSKS QVITISGIQEAVGEYFHVRLEDFKAKKRTKSIAFPRQIAMYLSRELTDASLPKIGDEFGGRDHTTVIHAH EKISQLLKTDQVLKNDLAEIEKNLRKAQNMF
GI | 16802050 | REF | NP_463535.1 | DNA聚合酶III亚基β[李斯特菌EGD-E] MKFVIERDRLVQAVNEVTRAISARTTIPILTGIKIVVNDEGVTLTGSDSDISIEAFIPLIENDEVIVEVE SFGGIVLQSKYFGDIVRRLPEENVEIEVTSNYQTNISSGQASFTLNGLDPMEYPKLPEVTDGKTIKIPIN VLKNIVRQTVFAVSAIEVRPVLTGVNWIIKENKLSAVATDSHRLALREIPLETDIDEEYNIVIPGKSLSE LNKLLDDASESIEMTLANNQILFKLKDLLFYSRLLEGSYPDTSRLIPTDTKSELVINSKAFLQAIDRASL LARENRNNVIKLMTLENGQVEVSSNSPEVGNVSENVFSQSFTGEEIKISFNGKYMMDALRAFEGDDIQIS FSGTMRPFVLRPKDAANPNEILQLITPVRTY
GI | 16802051 | REF | NP_463536.1 |假定蛋白lmo0003 [单增李斯特菌EGD-E] MMKDMTTGNPTKLIFLFAMPMLIGNLFQQFYTMIDAVIVGKFVSVDALAAVGATNSVNFFMISLIIGLMS GISVVVAQYFGFKDYDRLKDVIATATYAVVFSAIILTVAGVLLAKPLLILLRTPANILDDSTIFLTTLFI GILPMSLYNGMAAILRALGNSITPLIFLILSSLMNIALDFLFVVYMDMGVRGAAIATVLSQTAAAIAVIY YAYRHVPFMRIERAKFKLSTPLLKEMVRIGLPSGLQGSFISIGNMALQSLINGFGSSVVAAYTAASRIDS LTYQPGIAFGAASSMFAGQNIGAGKIDRVREGFWSGIKVVTAISIGITILVQLFARQFLLLFVDSSETEV INIGVSYLLIVSLFYVVVGILFVVRETLRGTGDAMVPLAMGIFELVSRLVIGFVLSLYIGYVGLWWATPV AWITATILGVWRYKSGAWQKKAVIRRK
GI | 16802052 | REF | NP_463537.1 |假定蛋白lmo0004 [单增李斯特菌EGD-E] MAETVKINSEFVTLGQLLQMIDVVSTGGMAKAYLSENTIYINGEQDNRRGKKLRNGDVILVPGVGKVKIE QGK
GI | 16802053 | REF | NP_463538.1 |重组蛋白F [单增李斯特菌EGD-E] MHLESIVLRNFRNYENLELEFSPSVNVFLGENAQGKTNLLEAVLMLALAKSHRTTNDKDFIMWEKEEAKM EGRIAKHGQSVPLELAITQKGKRAKVNHLEQKKLSQYVGNLNVVIFAPEDLSLVKGAPGIRRRFLNMEIG QMQPIYLHNLSEYQRILQQRNQYLKMLQMKRKVDPILLDILTEQFADVAINLTKRRADFIQKLEAYAAPI HHQISRGLETLKIEYKASITLNGDDPEVWKADLLQKMESIKQREIDRGVTLIGPHRDDSLFYINGQNVQD FGSQGQQRTTALSIKLAEIDLIHEETGEYPVLLLDDVLSELDDYRQSHLLGAIEGKVQTFVTTTSTSGID HETLKQATTFYVEKGTVKKS
是否有可能将每个序列中的每一行上的每一个蛋白质序列点,而无需复制和粘贴各手动?任何方法都很好。
P.S我很抱歉这个荒谬的表,但没有足够的声望点,我无法发布图片,这是我可以管理的最好的。
@swapnil但我想从记事本中的序列在第一个Excel表格的蛋白质序列列下以直线复制。
只需使用excel打开文本文件,它会问你关于分隔符指定那里|然后你会得到文件在excel – Swapnil