2015-08-20 56 views
1

我有这个代码,它需要很长的时间。如何优化这个ruby脚本?

当我使用-r配置文件时,它表明大部分时间似乎都去了mysql ...我怎么能加快速度呢? MySQL批量插入?

探查输出是在这里:http://pastebin.com/fH51ZeEB

代码:

#!/usr/bin/env ruby 

require 'mysql' 
require 'open-uri' 
require 'nokogiri' 
begin 
i=0 
src = Mysql.new 'localhost', 'me', 'pass', 'db' 
rs = src.query("SELECT * FROM npanxx") 
rs.each_hash do |row| 
    doc = Nokogiri::XML(open("http://localcallingguide.com/xmllocalprefix.php?npa="<< row["npa"].to_s << "&nxx=" << row["nxx"].to_s << "&dir=1")) 
    lca = Hash.new 
    doc.xpath("//prefix/npa | //prefix/nxx | //prefix/exch").each do |prefix| 
    if !lca.has_key? "npa" 
     lca["npa"] = prefix.content 
     next 
    end 
    if !lca.has_key? "nxx" 
     lca["nxx"] = prefix.content 
     next 
    end 
    if !lca.has_key? "exch" 
     lca["exch"] = prefix.content 
     src.query("INSERT INTO npanxxlca (npa,nxx,tnpa,tnxx,texch) VALUES (#{row['npa']}, #{row['nxx']}, #{lca['npa']}, #{lca['nxx']}, #{lca['exch']})") 
     lca = Hash.new 
    end 
    end 
    puts (i+=1).to_s << "- #{row['npa']}, #{row['nxx']}\n" 
end 
rescue Mysql::Error => e 
    puts e.errno 
    puts e.error 
ensure 
    src.close if src 
end 
+0

似乎更适合http://codereview.stackexchange.com/因为这段代码实际上起作用了,不是吗? – Oka

+0

是的,我不知道那个存在... – zevlag

回答

1

使用TyphoeusHydra你可以做requests in parallel。它允许设置自定义max concurrency(默认为200)。
,而不是分析XMLNokogiriXPath多次搜索值和每一次存储到新的散列的,你只是可以直接使用crack解析XML到哈希对象:

require 'benchmark' 
require 'typhoeus' 
require 'mysql' 
require 'crack' 
require 'json' 

BASE_URL ||= 'http://localcallingguide.com/xmllocalprefix.php'.freeze 

HOST  ||= 'localhost'.freeze 
USER  ||= 'me'.freeze 
PASSWORD ||= 'pass'.freeze 
DATABASE ||= 'db'.freeze 

# 
# Build lca request based on provided npa and nxx 
# @param [Integer, String] npa - NPA 
# @param [Integer, String] nxx - NXX 
# @return [Typhoeus::Request] - request object 
def lca_request(npa, nxx) 
    Typhoeus::Request.new(BASE_URL, params: { dir: 1, npa: npa, nxx: nxx }) 
end 

# 
# Convert XML string into Hash object 
# @param [String] xml - XML string to convert 
# @return [Hash] Ruby Hash object converted from XML string 
def xml_to_hash(xml) 
    Crack::XML.parse(xml) 
end 

# 
# Fetch lca_data from Hash response 
# Response with error will be converted to empty array 
# @param [Hash] hash - response 
# @return [Array] lca data from response. Empty array if invalid data provided 
def lca_data(hash) 
    data = hash['root']['lca_data']['prefix'] 
    data.is_a? Hash ? [data] : Array(data) 
rescue NoMethodError 
    [] 
end 

# 
# Fetch lca_data from XML string (see #lca_data) 
# @param [String] xml - string from where to fetch lca_data 
# @return [Array] lca data from response. Empty array if invalid data providede 
def lca_data_from_xml(xml) 
    lca_data(xml_to_hash(xml)) 
end 

# Main function 
def main 
    src = Mysql.new(HOST, USER, PASSWORD, DATABASE) 
    rs = src.query('SELECT * FROM npanxx') 
    hydra = Typhoeus::Hydra.new 
    rs.each_hash do |row| 
    npa, nxx = row['npa'], row['nxx'] 
    request = lca_request(npa, nxx) 
    request.on_complete do |response| 
     lca_data = lca_data_from_xml(response.body) 
     lca_data.each do |lca| 
     src.query("INSERT INTO npanxxlca (npa,nxx,tnpa,tnxx,texch) VALUES (#{npa}, #{nxx}, #{lca['npa']}, #{lca['nxx']}, #{lca['exch']})") 
     end 
    end 
    hydra.queue(request) 
    end 
    hydra.run 
end 

puts Benchmark.measure { main }.real 

我没有什么经验MySQL工作,所以我不能推荐如何优化那部分。

+0

我没有测试最终的代码,因为我的MySQL服务器和数据库没有设置。所以,如果您有任何疑问或问题,请让我知道。如果这样的作品,我很好奇:)有多快:) –

+0

我喜欢这种方法,但我遇到了一个问题,当只有1条返回,我得到数组的数组,而不是散列数组:[[ “npa”,“907”,[“nxx”,“221”],[“exch”,“003650”],[“ocn”,“3023”],[“company_name”,“UNITED UTILITIES,INC。 “],[”rc“,”Birch Creek“],[”region“,”AK“]] npa.rb:65:在'[]'中:没有将字符串隐式转换为Integer(TypeError) \t from npa .rb:65:在主' – zevlag

+0

@zevlag'块(3级)中,我更新了'lca_data'方法以确保返回哈希数组。 –

2

你可以尝试插入多行,我认为这是bottleneck.First,你可以保留值的阵列中,当数组足够大,然后插入多行,就像这样。

INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9); 

how-to-insert-multiple-records-into-database

+0

我喜欢停在100行或1MB,以先到者为准。 –