逐行读取文件

-2

我有两个文件，我想逐行阅读（第一个包含每行一个单词，第二个每行一个句子）。逐行读取文件

目标是计算句子的数量从file 2包含一个单词在file 1。

这里是我的代码：

open(my $words, '<:utf8', 'test') or die "Unable to open for read: $!"; `#test file is the file that contain my words` 
open(my $sentences, '<:utf8', 'sentences') or die "Unable to open for read: $!"; `#sentences fila that contain one sentence per line` 
open my $fh_resultat, ">:utf8", 'result'; 
my $word; 
#i want to calculate the number of sentences from my $sentences that containe word from my file $words 
while(defined($word = <$words>)) { 
    chomp $word ; 
    $word =~ s/^\s*|\s*$//g; 
    my $nb = 0; 
    my $idf; 
    my $ph; 
    while (defined ($ph = <$sentences>)){ 
     my @tab = split(/ /, $ph); 
     chomp @tab ; 
     foreach my $val(@tab) { 
      if($word eq $val){ 
       $nb = $nb + 1; 
       last; 
      } 
     } 
    } 
    print $fh_resultat "$word:$nb\n"; 
}

，但只对第一个文件的第一个字的处理！

来源

2017-05-16 rim

如果您要求大量的人阅读并理解您的代码，那么尽可能让它阅读起来很简单。我已经做了一些轻量级的重新格式化，以添加一些缩进，并使您对空白的使用更加均匀。请在将来自己做。 –

当您将文件句柄读入文件末尾时，从该文件句柄读取的下一个文件将返回undef。无论您打电话多少次，它都会继续返回undef。

如果不使用seek()函数将文件指针重置为文件的起始位置，则无法遍历短语文件。

seek $CorpusPhrases, 0, 0;

或者，你可能会考虑读你的文件之一（或两者）到内存中，这样你就不需要继续阅读文件。

来源

2017-05-16 12:53:08

看着你的代码;只会对文件的第一个字执行处理，因为您在从“word”文件中读取的第一行遍历整个“句子”文件中的。

上述两种解决方案已经提到;使用查找和加载到内存中。

我是一个提倡将文件加载到内存并进行相应处理的人。

#test file is the file that contain my words 
open(my $words, '<:utf8', 'test') or die "Unable to open for read: $!"; 

#sentences fila that contain one sentence per line 
open(my $sentences, '<:utf8', 'sentences') or die "Unable to open for read: $!"; 
open my $fh_resultat, ">:utf8", 'result'; 
my $word; 

#i want to calculate the number of sentences from my $sentences that containe word from my file $words 

#load sentences into memory 
my @process; 
while ($line = <$sentences>) { 
    push (@process, $line); 
} 
close(sentences); 

while(defined($word = <$words>)) { 
    chomp $word ; 
    $word =~ s/^\s*|\s*$//g; 
    my $nb = 0; 
    my $idf; 
    my $ph; 

    for $ph (@process) { 
     my @tab = split(/ /, $ph); 
     chomp @tab ; 
     foreach my $val(@tab) { 
      if($word eq $val){ 
       $nb = $nb + 1; 
       last; 
      } 
     } 
    } 
    print $fh_resultat "$word:$nb\n"; 
}

来源

2017-05-16 15:22:08 Carlos

你有一种相当冗长的写作方式'my @ process = <$sentences>;':-) –

逐行读取文件

回答

相关问题