2010-09-09 243 views
0

我有一点小小的应用程序,并试图添加多线程。这里是代码(MyMech是用来处理HTTP错误WWW ::机械化子类):线程应用程序意外终止

#!/usr/bin/perl 

use strict; 
use MyMech; 
use File::Basename; 
use File::Path; 
use HTML::Entities; 
use threads; 
use threads::shared; 
use Thread::Queue; 
use List::Util qw(max sum); 

my $page = 1; 
my %CONFIG = read_config(); 

my $mech = MyMech->new(autocheck => 1); 
$mech->quiet(0); 

$mech->get($CONFIG{BASE_URL} . "/site-map.php"); 

my @championship_links = 
    $mech->find_all_links(url_regex => qr/\d{4}-\d{4}\/$/); 

foreach my $championship_link (@championship_links) { 

    my @threads; 

    my $queue   = Thread::Queue->new; 
    my $queue_processed = Thread::Queue->new; 

    my $url = sprintf $championship_link->url_abs(); 

    print $url, "\n"; 

    next unless $url =~ m{soccer}i; 

    $mech->get($url); 

    my ($last_round_loaded, $current_round) = 
     find_current_round($mech->content()); 

    unless ($last_round_loaded) { 

     print "\tLoading rounds data...\n"; 

     $mech->submit_form(

      form_id => "leagueForm", 
      fields => { 

       round => $current_round, 
      }, 
     ); 
    } 

    my @match_links = 
     $mech->find_all_links(url_regex => qr/matchdetails\.php\?matchid=\d+$/); 

    foreach my $link (@match_links) { 

     $queue->enqueue($link); 
    } 

    print "Starting printing thread...\n"; 

    my $printing_thread = threads->create(
     sub { printing_thread(scalar(@match_links), $queue_processed) }) 
     ->detach; 

    push @threads, $printing_thread; 

    print "Starting threads...\n"; 

    foreach my $thread_id (1 .. $CONFIG{NUMBER_OF_THREADS}) { 

     my $thread = threads->create(
      sub { scrape_match($thread_id, $queue, $queue_processed) }) 
      ->join; 
     push @threads, $thread; 
    } 

    undef $queue; 
    undef $queue_processed; 

    foreach my $thread (threads->list()) { 

     if ($thread->is_running()) { 

      print $thread->tid(), "\n"; 
     } 
    } 

    #sleep 5; 
} 

print "Finished!\n"; 

sub printing_thread { 

    my ($number_of_matches, $queue_processed) = @_; 

    my @fields = 
     qw (
      championship 
      year 
      receiving_team 
      visiting_team 
      score 
      average_home 
      average_draw 
      average_away 
      max_home 
      max_draw 
      max_away 
      date 
      url 
     ); 

    while ($number_of_matches) { 

     if (my $match = $queue_processed->dequeue_nb) { 

      open my $fh, ">>:encoding(UTF-8)", $CONFIG{RESULT_FILE} or die $!; 

      print $fh join("\t", @{$match}{@fields}), "\n"; 
      close $fh; 

      $number_of_matches--; 
     } 
    } 

    threads->exit(); 
} 

sub scrape_match { 

    my ($thread_id, $queue, $queue_processed) = @_; 

    while (my $match_link = $queue->dequeue_nb) { 

     my $url = sprintf $match_link->url_abs(); 

     print "\t$url", "\n"; 

     my $mech = MyMech->new(autocheck => 1); 
     $mech->quiet(0); 

     $mech->get($url); 

     my $match = parse_match($mech->content()); 
     $match->{url} = $url; 

     $queue_processed->enqueue($match); 
    } 

    return 1; 
} 

,我有一些奇怪的事情,此代码。有时它运行,但有时它会退出而没有错误(在->detach点)。我知道@match_links包含数据,但线程不会被创建,它只是关闭。通常在处理第二个$championship_link条目后终止。

可能是我做错了什么?

更新 这里是find_current_round子程序代码(但我敢肯定它是不相关的问题):

sub find_current_round { 

    my ($html) = @_; 

    my ($select_html) = $html =~ m{ 

    <select\s+name="round"[^>]+>\s* 
    (.+?) 
    </select> 
    }isx; 

    my ($option_html, $current_round) = $select_html =~ m{ 

    (<option\s+value="\d+"(?:\s+ selected="selected")?>(\d+)</option>)\Z 
    }isx; 

    my ($last_round_loaded) = $option_html =~ m{selected}; 

    return ($last_round_loaded, $current_round); 
} 
+0

您的代码缺少'find_current_round'子例程。你可以发布它吗? – Zaid 2010-09-09 14:00:24

+0

@Zaid:我已经发布了find_current_round的代码。 – gangabass 2010-09-10 00:34:46

回答

0

第一关 - 不使用dequeue_nb()。这是一个坏主意,因为如果一个队列暂时为空,它会返回undef,并且你的线程将退出。

改为使用dequeue和和enddequeue会阻止,但是一旦你的队列中有你的队列,就会退出。

你也在用你的线程做一些非常奇怪的事情 - 我建议你很少想要detach一个线程。你只是假设你的线程将在你的程序之前完成,这不是一个好的计划。

同样,

my $thread = threads->create(
     sub { scrape_match($thread_id, $queue, $queue_processed) }) 
     ->join; 

你生成一个线程,然后立即加入它。因此,join调用将...阻止您的线程退出。你根本不需要线程来做到这一点......

你也可以在你的foreach循环中定义你的队列。我不认为这是一个好计划。我会建议,而不是 - 在外部范围,并产生定义数量的'工人'线程(和一个“打印”线程)。

然后通过队列机制提供它们。否则,你最终会创建多个队列实例,因为它们是词法范围的。

而一旦你已经完成排队的东西,发出$queue -> end这将终止while循环。

您也不需要给线程$thread_id,因为他们已经有一个。请尝试:threads -> self -> tid();