字符串匹配的搜索

一个文本文件中像这样的查询文件：字符串匹配的搜索

fooLONGcite 
GetmoreDATA 
stringMATCH 
GOODthing

另一个文本文件中像这样的主题文件：

sometingfooLONGcite 
anyotherfooLONGcite 
matchGetmoreDATA 
GETGOODthing 
brotherGETDATA 
CITEMORETHING 
TOOLONGSTUFFETC

预期的结果将是摆脱主题文件匹配的字符串然后打印出来。所以，输出应该是：

sometingfooLONGcite 
anyotherfooLONGcite 
matchGetmoreDATA  
GETGOODthing

这是我的Perl脚本。但它不起作用。你能帮我找出问题在哪里吗？谢谢。

#!/usr/bin/perl 
use strict; 

# to check the command line option 
if($#ARGV<0){ 
    printf("Usage: \n <tag> <seq> <outfile>\n"); 
    exit 1; 
} 

# to open the given infile file 
open(tag, $ARGV[0]) or die "Cannot open the file $ARGV[0]"; 
open(seq, $ARGV[1]) or die "Cannot open the file $ARGV[1]"; 

my %seqhash =(); 
my $tag_id; 
my $tag_seq; 
my $seq_id; 
my $seq_seq; 
my $seq; 
my $i = 0; 

print "Processing cds seq\n"; 
#check the seq file 
while(<seq>){ 
    my @line = split; 
    if($i != 0){ 
     $seqhash{$seq_seq} = $seq; 
     $seq = ""; 
     print "$seq_seq\n"; 
    } 
    $seq_seq = $line[0]; 
    $i++; 
} 

while(<tag>){ 
    my @tagline = split; 
    $tag_seq = $tagline[0]; 
    $seq = $seqhash{$seq_seq}; 
    #print "$tag_seq\n"; 
    print "$seq\n"; 
    #print output ">$id\n$seq\n"; 
} 
#print "Ending of Processing gff\n"; 

close(tag); 
close(seq);

来源

2012-02-01 Jianguo

[什么都有你试过？]（http://mattgemmell.com/2008/12/08/what-have-you-tried/） – 2012-02-01 21:24:24

我加了我的脚本。 – Jianguo 2012-02-01 21:28:26

据我所知，您寻找的字符串的一部分，而不是一个确切的匹配。这里有一个脚本，可以做我认为你正在寻找的东西：

script.pl的内容。我考虑到查询的文件很小，因为我添加的所有内容的正则表达式：

use warnings; 
use strict; 

## Check arguments. 
die qq[Usage: perl $0 <query_file> <subject_file>\n] unless @ARGV == 2; 

## Open input files. Abort if found errors. 
open my $fh_query, qq[<], shift @ARGV or die qq[Cannot open input file: $!\n]; 
open my $fh_subject, qq[<], shift @ARGV or die qq[Cannot open input file: $!\n]; 

## Variable to save a regex with alternations of the content of the 'query' file. 
my $query_regex; 

{ 
    ## Read content of the 'query' file in slurp mode. 
    local $/ = undef; 
    my $query_content = <$fh_query>; 

    ## Remove trailing spaces and generate a regex. 
    $query_content =~ s/\s+\Z//; 
    $query_content =~ s/\n/|/g; 
    $query_regex = qr/(?i:($query_content))/; 
} 

## Read 'subject' file and for each line compare if that line matches with 
## any word of the 'query' file and print in success. 
while (<$fh_subject>) { 
    if (m/$query_regex/o) { 
     print 
    } 
}

运行脚本：

perl script.pl query.txt subject.txt

而且结果：

sometingfooLONGcite 
anyotherfooLONGcite 
matchGetmoreDATA 
GETGOODthing

来源

2012-02-01 22:07:53 Birei

它工作正常。但是如果我使用另一个文件，它将不起作用。你能帮我解决它吗？谢谢。这里是新数据的链接：http：//stackoverflow.com/questions/9101082/extract-sequence-information-using-tag-sequence – Jianguo 2012-02-01 23:04:51

您目前的代码没有多大意义;你甚至可以引用你不指定任何东西的变量。

您只需将第一个文件读入散列，然后检查第二行中的每一行。

while (my $line = <FILE>) 
{ 
    chomp($line); 
    $hash{$line} = 1; 
} 

... 

while (my $line = <FILE2>) 
{ 
    chomp($line); 
    if (defined $hash{$line}) 
    { 
     print "$line\n"; 
    } 
}

来源

2012-02-01 21:34:56

我跑了这段代码，为什么没有发生？非常感谢。 – Jianguo 2012-02-01 21:45:05

::感叹::因为它只是你需要做的一个例子。 – 2012-02-01 21:49:34

你能帮我完成这段代码吗？请。我在perl上很新。非常感谢您的帮助。 – Jianguo 2012-02-01 21:53:05

字符串匹配的搜索

回答

相关问题