2012-06-27 164 views
2

脚本的目的是处理文件中的所有单词,并输出出现次数最多的所有单词。所以如果有三个单词出现10次,程序应该输出所有的单词。Perl脚本问题

脚本现在运行,这要归功于我在这里得到的一些提示。但是,它不处理大型文本文件(即新约)。我不确定这是我的错,还是只是代码的限制。我相信该计划还有其他一些问题,所以任何帮助将不胜感激。

#!/usr/bin/perl -w 
require 5.10.0; 

print "Your file: " . $ARGV[0] . "\n"; 
#Make sure there is only one argument 
if ($#ARGV == 0){ 

    #Make sure the argument is actually a file 
    if (-f $ARGV[0]){ 

     %wordHash =();  #New hash to match words with word counts 
     $file=$ARGV[0];  #Stores value of argument 
     open(FILE, $file) or die "File not opened correctly."; 

     #Process through each line of the file 
     while (<FILE>){ 
      chomp; 
      #Delimits on any non-alphanumeric 
      @words=split(/[^a-zA-Z0-9]/,$_); 
      $wordSize = @words; 

      #Put all words to lowercase, removes case sensitivty 
      for($x=0; $x<$wordSize; $x++){ 
       $words[$x]=lc($words[$x]); 
      } 

      #Puts each occurence of word into hash 
      foreach $word(@words){ 
       $wordHash{$word}++; 
      } 
     } 
     close FILE; 

     #$wordHash{$b} <=> $wordHash{$a}; 
     $wordList=""; 
     $max=0; 

     while (($key, $value) = each(%wordHash)){ 
      if($value>$max){ 
       $max=$value; 
      } 
      } 

     while (($key, $value) = each(%wordHash)){ 
      if($value==$max && $key ne "s"){ 
       $wordList.=" " . $key; 
      } 
      }  

     #Print solution 
     print "The following words occur the most (" . $max . " times): " . $wordList . "\n"; 
    } 
    else { 
     print "Error. Your argument is not a file.\n"; 
    } 
} 
else { 
    print "Error. Use exactly one argument.\n"; 
} 
+2

请使用编译脚本中的 –

+0

考虑HTTP“使用严格”:// WWW .66clouds.com/new_testament.html;) –

回答

6

你的问题在你的脚本的顶部在于两名失踪线:

use strict; 
use warnings; 

如果他们在那里,他们会报道很多线像这样:

Argument "make" isn't numeric in array element at ...

它来源于此行:

$list[$_] = $wordHash{$_} for keys %wordHash; 

数组元素只能是数字,并且由于您的键是单词,所以不起作用。这里发生的是任何随机字符串被强制为一个数字,并且对于任何不以数字开头的字符串,这将是0

您的代码可以正常读取数据,但我会以不同的方式写入数据。只有在这之后,你的代码才变得笨拙。

尽可能靠近我可以告诉,你要打印出最出现的单词,在这种情况下,你应该考虑下面的代码:

use strict; 
use warnings; 

my %wordHash; 
#Make sure there is only one argument 
die "Only one argument allowed." unless @ARGV == 1; 
while (<>) { # Use the diamond operator to implicitly open ARGV files 
    chomp; 
    my @words = grep $_,   # disallow empty strings 
     map lc,     # make everything lower case 
      split /[^a-zA-Z0-9]/; # your original split 
    foreach my $word (@words) { 
     $wordHash{$word}++; 
    } 
} 

for my $word (sort { $wordHash{$b} <=> $wordHash{$a} } keys %wordHash) { 
    printf "%-6s %s\n", $wordHash{$word}, $word; 
} 

正如你会注意到,您可以根据排序散列值。

1

这里是写它(我可能也说:“Perl是不是C”)的完全不同的方式:

#!/usr/bin/env perl 

use 5.010; 
use strict; use warnings; 
use autodie; 

use List::Util qw(max); 

my ($input_file) = @ARGV; 
die "Need an input file\n" unless defined $input_file; 

say "Input file = '$input_file'"; 

open my $input, '<', $input_file; 

my %words; 

while (my $line = <$input>) { 
    chomp $line; 

    my @tokens = map lc, grep length, split /[^A-Za-z0-9]+/, $line; 
    $words{ $_ } += 1 for @tokens; 
} 

close $input; 

my $max = max values %words; 
my @argmax = sort grep { $words{$_} == $max } keys %words; 

for my $word (@argmax) { 
    printf "%s: %d\n", $word, $max; 
}