2013-07-13 127 views
1

我解析HTML使用Nokogiri,然后获取这些类型的元素。解析与Nokogiri

<li data-item="{"title":"where is title","slug":"about some", 
    "has_many_images":false,"show_image":"abbxb","created_at":1373737401, 
    "show_attr":{"value":"150"}, 
    "location":"Alabama", 
    "category":"Table", 
    "is_business":false}"> 

    //here other many more 
</li> 

现在我想要得到这个data-item,我使用:

page.css("li[data-item]")[0] 

我得到这样的:

#<Nokogiri::XML::Element:0x14fc250 name="li" attributes=[#<Nokogiri::XML::Attr:0x14fc178 name="class" value="item">,等等...

但我想要这样:

"{"title":"where is title","slug":"about some", 
     "has_many_images":false,"show_image":"abbxb","created_at":1373737401, 
     "show_attr":{"value":"150"}, 
     "location":"Alabama", 
     "category":"Table", 
     "is_business":false}" 

有什么建议吗?

+2

@ nano.galvao好的编辑..我今天从你身上学到了.. :) –

回答

2

你可以得到以下选择该属性:

page.at_xpath("//li[1]/@data-item").content 

编辑

更完整的论证,在@ Priti的要求:

body = %Q{  
    <body> 
    <li data-item='{"title":"where is title","slug":"about some", 
     "has_many_images":false,"show_image":"abbxb","created_at":1373737401, 
     "show_attr":{"value":"150"}, 
     "location":"Alabama", 
     "category":"Table", 
     "is_business":false}'> 
    </li> 
    </body> 
} 
page = Nokogiri::XML(body) 
result = page.at_xpath("//li[1]/@data-item").content 
# "{\"title\":\"where is title\",\"slug\":\"about some\",   \"has_many_images\":false,\"show_image\":\"abbxb\",\"created_at\":1373737401,   \"show_attr\":{\"value\":\"150\"},   \"location\":\"Alabama\",   \"category\":\"Table\",   \"is_business\":false}" 
+0

太棒了!谢谢。 :) – rony36

+0

@ rony36你试过这段代码吗?我只有'{'。 –

+1

asker描述的'data-item'属性值无效。我认为他或她正确地逃脱了报价。例如,如果你用单引号包装属性值,我的选择器就可以工作。 –