2012-10-30 90 views
-1

我想凑有关网站的新专辑发布的信息,而且我通过引入nokogiri处理这个。这个想法是创建一个不错的数组将包含像这样红宝石 - 写入嵌套阵列

[ 
    0 => ['The Wall', 'Pink Floyd', '1979'], 
    1 => ['Led Zeppelin I', 'Led Zeppelin', '1969'] 
] 

这是我当前的代码项目。我是一个总的红宝石新手,所以任何建议将不胜感激。

@events = Array.new() 
# for every date we encounter 
doc.css("#main .head_type_1").each do |item| 

    date = item.text 

    # get every albumtitle 
    doc.css(".albumTitle").each_with_index do |album, index| 
    album = album.text 
    @events[index]['album'] = album 
    @events[index]['release_date'] = date 
    end 

    #get every artistname 
    doc.css(".artistName").each do |artist| 
    artist = artist.text 
    @events[index]['artist'] = artist 
    end 

end 

puts @events 

P.S.我想刮页面的格式有点怪:

<tr><th class="head_type_1">20 October 1989</th></tr> 
<tr><td class="artistName">Jean Luc-Ponty</td><td class="albumTitle">Some example album</td></tr> 
<tr><td class="artistName">Some Other Artist</td><td class="albumTitle">Some example album</td></tr> 
<tr><td class="artistName">Some Other Artist</td><td class="albumTitle">Some example album</td></tr> 
<tr><th class="head_type_1">29 October 1989</th></tr> 
<tr><td class="artistName">Some Other Artist</td><td class="albumTitle">Some example album</td></tr> 

当我尝试Ruby解释我遇到下面的错误中运行此:

get_events.rb:25:in `block (2 levels) in <main>': undefined method `[]=' for nil:NilClass (NoMethodError) 
from /Users/adrian/.rvm/gems/ruby-1.9.3-p286/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:239:in `block in each' 
from /Users/adrian/.rvm/gems/ruby-1.9.3-p286/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `upto' 
from /Users/adrian/.rvm/gems/ruby-1.9.3-p286/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `each' 
from get_events.rb:23:in `each_with_index' 
from get_events.rb:23:in `block in <main>' 
from /Users/adrian/.rvm/gems/ruby-1.9.3-p286/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:239:in `block in each' 
from /Users/adrian/.rvm/gems/ruby-1.9.3-p286/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `upto' 
from /Users/adrian/.rvm/gems/ruby-1.9.3-p286/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `each' 
from get_events.rb:18:in `<main>' 

如何解决这个?

+3

什么是你的问题? – waldrumpus

+0

添加了错误输出和问题:) – Carvefx

+1

当您为该复杂性的代码添加错误消息时,应该将行号添加到代码中。你认为有人会通过代码为你做所有的工作吗? – sawa

回答

1

我不能换我的头在你的解决方案,但附近有一座小打后,我想出了这个。

require 'pp' 
require 'nokogiri' 

str = %Q{ 
<tr><th class="head_type_1">20 October 1989</th></tr> 
<tr><td class="artistName">Jean Luc-Ponty</td><td class="albumTitle">Some album</td></tr> 
<tr><td class="artistName">Some Other Artist</td><td class="albumTitle">Some album</td></tr> 
<tr><td class="artistName">Some Other Artist</td><td class="albumTitle">Some album</td></tr> 
<tr><th class="head_type_1">29 October 1989</th></tr> 
<tr><td class="artistName">Some Other Artist</td><td class="albumTitle">Some album</td></tr> 
} 

doc = Nokogiri::HTML(str) 
date = "" 
result = [] 

doc.xpath("//tr").each do |tr| 
    children = tr.children 
    if children.first["class"] == "head_type_1" 
    date = children.first.content 
    else 
    artist, album = children.map {|c| c.content} 
    result << {album: album, artist: artist, date: date} 
    end 
end 

pp result 

输出:

[{:album=>"Some album", :artist=>"Jean Luc-Ponty", :date=>"20 October 1989"},
{:album=>"Some album", :artist=>"Some Other Artist", :date=>"20 October 1989"},
{:album=>"Some album", :artist=>"Some Other Artist", :date=>"20 October 1989"},
{:album=>"Some album", :artist=>"Some Other Artist", :date=>"29 October 1989"}]

不正是你所要求的,但也许更多一点的Ruby成语,我敢肯定,如果需要,你可以修改它。

+0

这正是我想要实现的,几乎到了那里,但对于一个小错误。非常感谢您的关注! – Carvefx

+0

而这段代码的不言自明的信息是,除了让·吕克 - 庞蒂之外,没有其他的艺术家。 :-) –

-1

索引变量是未定义的关于你的第二each

+0

这不是它,我试过doc.css(“。artistName”)。each_with_index do | artist,index | - 同样的输出 – Carvefx