2016-03-26 20 views
0

我有以下html代码片段。我想要网页抓取页面以获取主题和副主题并将其存储在对象中。如何使用jQuery选择器构建具有兄弟标签的分层对象

期望的结果是什么:

{ 
'topic': 'Java Basics', 
'subtopics':['Define the scope of variables', 'Define the structure of a Java class', ...] 
} 

我试图使其与Jsdom工作于Node.js和JQuery:

var jsdom = require('jsdom'); 
var fs = require("fs"); 


var topicos = fs.readFileSync("topic.html", "utf-8"); 

    jsdom.env(topicos, ["http://code.jquery.com/jquery.js"], function (error, window) { 
     var $ = window.$; 
     var length = $('div ~ ').each(function() { 
      //??? 
      var topic = $(this); 
      var text = topic.text();     
      console.log(text.trim()) 
     }); 
    }) 

但由于我缺乏jQuery的经验,我我无法正确组织层次结构。

HTML片段:

<div> 
    <strong>Java Basics&nbsp;</strong></div> 
<ul> 
    <li> 
     Define the scope of variables&nbsp;</li> 
    <li> 
     Define the structure of a Java class 
    </li> 
    <li> 
     Create executable Java applications with a main method; run a Java program from the command line; including 
     console output. 
    </li> 
    <li> 
     Import other Java packages to make them accessible in your code 
    </li> 
    <li> 
     Compare and contrast the features and components of Java such as: 
     platform independence, object orientation, encapsulation, etc. 
    </li> 
</ul> 
<div> 
    <strong>Working With Java Data Types&nbsp;</strong></div> 
<ul> 
    <li> 
     Declare and initialize variables (including casting of primitive data types) 
    </li> 
    <li> 
     Differentiate between object reference variables and primitive variables 
    </li> 
    <li> 
     Know how to read or write to object fields 
    </li> 
    <li> 
     Explain an Object's Lifecycle (creation, "dereference by reassignment" and garbage collection) 
    </li> 
    <li> 
     Develop code that uses wrapper classes such as Boolean, Double, and Integer. &nbsp;</li> 
</ul> 
... 

回答

1

这里的工作片断fiddle

var topicos = []; 

jQuery('div').each(function(){ 
var data = {}; 
var jThis = jQuery(this); 
    data.topic = jThis.find('strong').text(); 
    data.subtopics = []; 
    jThis.next('ul').find('li').each(function(){ 
    var jThis = jQuery(this); 
    data.subtopics.push(jThis.text()); 
    }); 
topicos.push(data); 
}); 

console.log(topicos); 

但我会强烈建议类添加到您的标记,并以此作为选择的,而不是标签名称:

<div class="js-topic-data"> 
    <div> 
    <strong class="js-topic">Java Basics&nbsp;</strong> 
    </div> 
    <ul> 
    <li class="js-sub-topic"> 
     Define the scope of variables&nbsp;</li> 
    <li> 
    </ul> 
</div> 

然后,你可以做类似:

jQuery('.js-topic-data').each(function(){ 
var data = {}; 
var jThis = jQuery(this); 
    data.topic = jThis.find('.js-topic').text(); 
    data.subtopics = []; 
    jThis.next('.js-sub-topic').each(function(){ 
    var jThis = jQuery(this); 
    data.subtopics.push(jThis.text()); 
    }); 
topicos.push(data); 
}); 

这对于标记更改等更加健壮