Jsoup表解析

-1

我是jsoup和这个解析thingy的新手，所以如果你需要更多的信息让你能够回答我的问题，请告诉我！Jsoup表解析

我有这张表，我想用Java中的Jsoup解析。我只是想获得的文本：

“BS计算机科学，CS（2012-2014）”

从表

<h3>Fahran S Kamili (fsk226)</h3> 
     <div> 
      10 Degree Audit Requests Returned. 
     </div> 
     <table> 
      <thead> 
       <tr> 
<!-- *nrfkh - 9/2012: [degaudt-634]* --> 
         <th colspan="8">Degree Audits Requested</th> 

<!-- *end nrfkh - 9/2012: [degaudt-634]* --> 

       </tr> 
       <tr> 
        <th>Rerun</th> 

<!-- *nrfkh - 9/2012: [degaudt-634]* --> 

<!-- *end nrfkh - 9/2012: [degaudt-634]* --> 
        <th>Request Created</th> 
<!-- *nrfkh - 9/2012: [degaudt-634]* --> 

<!-- *end nrfkh - 9/2012: [degaudt-634]* --> 
        <th>Audit Type</th> 
        <th>Program</th> 
        <th>Courses Requested</th> 
        <th>Request Status</th> 
        <th>Audit ID</th> 
        <th>Delete Option</th> 
       </tr> 
      </thead> 
        <tbody><tr> 
         <td> 
            <a href="https://utdirect.utexas.edu/apps/degree/audits/requests/student_individual/?form-0-eid=fsk226&form-0-name=Fahran%20S%20Kamili&form-0-begin_ccyy=2012&form-0-degree_plan=ESC%20SS%20CS&form-0-minor=&current=X&future=&planned=&form-TOTAL_FORMS=20&form-INITIAL_FORMS=0&form-MAX_NUM_FORMS=&rerun=" target="_blank">Rerun</a> 
         </td> 
<!-- *nrfkh - 9/2012: [degaudt-634]* --> 
<!-- *end nrfkh - 9/2012: [degaudt-634]* --> 
         <td> 
          12/20/2013 
          05:06 PM 
         </td> 
<!-- *nrfkh - 9/2012: [degaudt-634]* --> 
<!-- *end nrfkh - 9/2012: [degaudt-634]* --> 
         <td> 
           Normal 

         </td> 
         <td> 
          B S Computer Science, CS 
          (2012-2014) 
         </td>

的这部分

表实际上是延伸到了长，但这些包含只是彼此的兄弟姐妹（所以我假设如果我能得到这个文本，我也可以很容易地得到其他文本）。

来源

2014-01-25 user3134067

'“所以如果你需要更多的信息......”“ - 是的，就像你到目前为止尝试过什么，以及它如何不工作？还有什么让你特别困惑？ –

如果我是你的HTML部分保存到一个文件，并通过jsoup解析它，我会尝试打印自认为遇到的所有td元素是你所追求的：

public static void main(String... args) throws IOException { 
     File input = new File("C:/users/XYZ/desktop/input.html"); 
     Document doc = Jsoup.parse(input, "UTF-8", ""); 
     Elements tds = doc.getElementsByTag("td"); 
     for (Element td : tds) { 
      System.out.println(td.text()); 
     } 
    }

输出：

Rerun 
12/20/2013 05:06 PM 
Normal 
B S Computer Science, CS (2012-2014)

来源

2014-01-25 19:53:01 PopoFibo

回答

相关问题