2014-02-25 91 views
1

我想使用jsoup从下面的HTML代码中提取以下< td>标签,其中包含class css-sched-table-title和css-sched-waypoint。但是我无法理解有人可以帮助哪里出错?无法在java中使用jsoup从html中提取内容?

Document doc = Jsoup.parse("somelink.html"); 
    Elements row = doc.select(".css-sched-table-title td"); 
    Iterator<Element> iterator = row.listIterator(); 
    while(iterator.hasNext()) 
    { 
     Element element = iterator.next(); 
     String value = element.text(); 
     System.out.println("value : " + value); 
    } 

<tr> 
     <td ALIGN="CENTER" COLSPAN="16" CLASS="css-sched-table-title"><b>Saturday - </b><b>Afternoon</b></td> 
    </tr> 
    <tr VALIGN="BOTTOM"> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Townline and Southern</TD> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Clearbrook and Blueridge</TD> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Clearbrook and South Fraser</TD> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Ar. Bourquin Exchange</TD> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Lv. Bourquin Exchange</TD> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Downtown Abbotsford</TD> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">McMillan and Old Yale</TD> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Sandy Hill and Old Clayburn</TD> 
    </tr> 
+0

你尝试 “td.css-SCHED表标题”? – Nishant

+0

嗨Nishant没有工作 –

回答

1

有一个td标签与css-sched-table-titlecss-sched-waypoints列表。

此外,对齐到正确的语法应该是Elements row = doc.select("td.css-sched-waypoints");,请参阅here

注意:html文件原样使用,jsoup不会将其解释为有效的表格html内容。我不得不将上面的内容附在<table></table>标签内。

当我尝试下面的代码与html文件:

Elements row = doc.select("td.css-sched-waypoints"); 
    Element title = doc.select("td.css-sched-table-title").first(); 

    System.out.println(title.text()); 
    Iterator<Element> iterator = row.listIterator(); 
    while (iterator.hasNext()) { 
     Element element = iterator.next(); 
     String id = element.attr("id"); 
     String classes = element.attr("class"); 
     String value = element.text(); 
     System.out.println("Id : " + id + ", classes : " + classes 
       + ", value : " + value); 
    } 

我得到的,

Saturday - Afternoon 
Id : , classes : css-sched-waypoints, value : Townline and Southern 
Id : , classes : css-sched-waypoints, value : Clearbrook and Blueridge 
Id : , classes : css-sched-waypoints, value : Clearbrook and South Fraser 
Id : , classes : css-sched-waypoints, value : Ar. Bourquin Exchange 
Id : , classes : css-sched-waypoints, value : Lv. Bourquin Exchange 
Id : , classes : css-sched-waypoints, value : Downtown Abbotsford 
Id : , classes : css-sched-waypoints, value : McMillan and Old Yale 
Id : , classes : css-sched-waypoints, value : Sandy Hill and Old Clayburn 
+0

嗨PopoFibo感谢解释我得到它纠正,现在它工作正常。 –

+0

嗨PopoFibo一个简单的问题是可能的元素行= doc.select(“td.css-sched-waypoints”);元素时间= doc.select(“td.css-sched-times”);而不是有2个独立的元素只是让他们在一个元素实例? –

+0

@ dev_marshell08是的,你确定可以 - 开始参考这个问题http://stackoverflow.com/questions/21694216/selecting-elements-that-have-multiple-class-whilst-using-jsoup/21694612#21694612 – PopoFibo