赛璐珞＆高性能HTTP请求

我尝试将现有的爬虫从EventMachine切换到Celluloid。为了与Celluloid保持联系，我在一个Linux盒子上生成了一堆150 kB的静态文件，这些文件都是通过Nginx提供的。赛璐珞＆高性能HTTP请求

底部的代码应该能够完成它的工作，但是我不明白的代码存在问题：由于线程池大小为50，代码应该产生最多50个线程，但是它会产生180个线程。如果我将池大小增加到100，则会产生330个线程。那里出了什么问题？

这段代码的简单的复制粘贴&应该从事的每一个框，因此任何线索，欢迎:)

#!/usr/bin/env jruby 

require 'celluloid' 
require 'open-uri' 

URLS = *(1..1000) 

@@requests = 0 
@@responses = 0 
@@total_size = 0 

class Crawler 
    include Celluloid 

    def fetch(id) 
    uri = URI("http://data.asconix.com/#{id}") 
    puts "Request ##{@@requests += 1} -> #{uri}" 
    begin 
     req = open(uri).read 
    rescue Exception => e 
     puts e 
    end 
    end 
end 

URLS.each_slice(50).map do |idset| 
    pool = Crawler.pool(size: 50) 
    crawlers = idset.to_a.map do |id| 
    begin 
     pool.future(:fetch, id) 
    rescue Celluloid::DeadActorError, Celluloid::MailboxError 
    end 
    end 
    crawlers.compact.each do |resp| 
    $stdout.print "Response ##{@@responses += 1} -> " 
    if resp.value.size == 150000 
     $stdout.print "OK\n" 
     @@total_size += resp.value.size 
    else 
     $stdout.print "ERROR\n" 
    end 
    end 
    pool.terminate 
    puts "Actors left: #{Celluloid::Actor.all.to_set.length} -- Alive: #{Celluloid::Actor.all.to_set.select(&:alive?).length}" 
end 

$stdout.print "Requests total: #{@@requests}\n" 
$stdout.print "Responses total: #{@@responses}\n" 
$stdout.print "Size total: #{@@total_size} bytes\n"

顺便说一句，当我定义each_slice外循环池发生同样的问题：

.... 
@pool = Crawler.pool(size: 50) 

URLS.each_slice(50).map do |idset| 
    crawlers = idset.to_a.map do |id| 
    begin 
     @pool.future(:fetch, id) 
    rescue Celluloid::DeadActorError, Celluloid::MailboxError 
    end 
    end 
    crawlers.compact.each do |resp| 
    $stdout.print "Response ##{@@responses += 1} -> " 
    if resp.value.size == 150000 
     $stdout.print "OK\n" 
     @@total_size += resp.value.size 
    else 
     $stdout.print "ERROR\n" 
    end 
    end 
    puts "Actors left: #{Celluloid::Actor.all.to_set.length} -- Alive: #{Celluloid::Actor.all.to_set.select(&:alive?).length}" 
end

来源

2012-09-09 ctp

你使用的是什么红宝石？ jRuby，Rubinius等？那些版本是什么？

我想问的原因是，每个ruby的线程处理方式不同。你似乎正在描述的是为监督员和任务添加的线程。看看帖子的日期，很可能纤维实际上正在成为原生线程，这可能会使它看起来像使用jRuby。另外，使用Futures通常会调用内部线程池，这与您的池无关。

有了这些原因和其他人喜欢他们，你可以寻找，这是有道理的，为什么你会有一个更高的线程数比你的池呼吁。这有点旧了，所以也许你可以跟进你是否仍然有这个问题，并发布输出。

来源

2013-12-06 14:36:48 digitalextremist

赛璐珞＆高性能HTTP请求

回答

相关问题