2014-01-25 120 views
-1

我是jsoup和这个解析thingy的新手,所以如果你需要更多的信息让你能够回答我的问题,请告诉我!Jsoup表解析

我有这张表,我想用Java中的Jsoup解析。我只是想获得的文本:

“BS计算机科学,CS(2012-2014)”

从表

<h3>Fahran S Kamili (fsk226)</h3> 
     <div> 
      10 Degree Audit Requests Returned. 
     </div> 
     <table> 
      <thead> 
       <tr> 
<!-- *nrfkh - 9/2012: [degaudt-634]* --> 
         <th colspan="8">Degree Audits Requested</th> 

<!-- *end nrfkh - 9/2012: [degaudt-634]* --> 

       </tr> 
       <tr> 
        <th>Rerun</th> 

<!-- *nrfkh - 9/2012: [degaudt-634]* --> 

<!-- *end nrfkh - 9/2012: [degaudt-634]* --> 
        <th>Request Created</th> 
<!-- *nrfkh - 9/2012: [degaudt-634]* --> 

<!-- *end nrfkh - 9/2012: [degaudt-634]* --> 
        <th>Audit Type</th> 
        <th>Program</th> 
        <th>Courses Requested</th> 
        <th>Request Status</th> 
        <th>Audit ID</th> 
        <th>Delete Option</th> 
       </tr> 
      </thead> 
        <tbody><tr> 
         <td> 
            <a href="https://utdirect.utexas.edu/apps/degree/audits/requests/student_individual/?form-0-eid=fsk226&form-0-name=Fahran%20S%20Kamili&form-0-begin_ccyy=2012&form-0-degree_plan=ESC%20SS%20CS&form-0-minor=&current=X&future=&planned=&form-TOTAL_FORMS=20&form-INITIAL_FORMS=0&form-MAX_NUM_FORMS=&rerun=" target="_blank">Rerun</a> 
         </td> 
<!-- *nrfkh - 9/2012: [degaudt-634]* --> 
<!-- *end nrfkh - 9/2012: [degaudt-634]* --> 
         <td> 
          12/20/2013 
          05:06 PM 
         </td> 
<!-- *nrfkh - 9/2012: [degaudt-634]* --> 
<!-- *end nrfkh - 9/2012: [degaudt-634]* --> 
         <td> 
           Normal 

         </td> 
         <td> 
          B S Computer Science, CS 
          (2012-2014) 
         </td> 
的这部分

表实际上是延伸到了长,但这些包含只是彼此的兄弟姐妹(所以我假设如果我能得到这个文本,我也可以很容易地得到其他文本)。

+2

'“所以如果你需要更多的信息.​​.....”“ - 是的,就像你到目前为止尝试过什么,以及它如何不工作?还有什么让你特别困惑? –

回答

0

如果我是你的HTML部分保存到一个文件,并通过jsoup解析它,我会尝试打印自认为遇到的所有td元素是你所追求的:

public static void main(String... args) throws IOException { 
     File input = new File("C:/users/XYZ/desktop/input.html"); 
     Document doc = Jsoup.parse(input, "UTF-8", ""); 
     Elements tds = doc.getElementsByTag("td"); 
     for (Element td : tds) { 
      System.out.println(td.text()); 
     } 
    } 

输出:

Rerun 
12/20/2013 05:06 PM 
Normal 
B S Computer Science, CS (2012-2014)