2016-08-30 34 views
1
 for (int x = 0; x < 8000; x += 50) { 
      Document doc = Jsoup.connect("localhost.com/"+x).get(); 
      Elements links = doc.select("a[href]"); 
      for (Element link: links) { 
       String text = link.text(); 
       System.out.println(text); 
      } 

     } 
    } 
} 

这将使像这样的输出:的java jsoup除去新线

Adrian Riven 

HalfSugar No Ice 

Yassuo 

Amandadog 

P1 Sloosh 

反正是有删除空行?所以它会看起来像这样的输出:

Adrian Riven 
HalfSugar No Ice 
Yassuo 
Amandadog 
P1 Sloosh 

我试过 text.replace( “\ n”, “”); text.replaceAll( “\ r \ n吗?”, “”)

编辑这样的,这并没有为我工作 没有尝试另一种

Elements links = doc.select("a[href]"); 
     for (Element link: links) { 
      Document docs = Jsoup.parse(String.valueOf(links)); 
      docs.outputSettings().escapeMode(Entities.EscapeMode.xhtml); 
      String text = link.text()+link.text(); 
      System.out.println(text.replace("Show More", "")); 

HTML示例:

</td> 
    <td class="SummonerName Cell"> 
     <a href="/summoner/userName=Cris" class="Link">Cris</a> 
    </td> 
       <td class="TierRank Cell">Challenger</td> 
     <td class="LP Cell">1,137 LP</td> 
      <td class="TeamName Cell"> 
         Apex Gaming 
       </td> 
    <td class="RatioGraph Cell"> 
         <div class="WinRatioGraph"> 
       <div class="Graph"> 
+1

尝试使用System.out.print(文本),它在文本中有换行符 – ravthiru

+0

你可以把你的示例html代码? – soorapadman

+0

是的,我可以..添加。 – nooby

回答

0

这招对我的作品:

Document doc = Jsoup.connect("localhost.com").get(); 

     Elements links = doc.select("a[href]"); 
     for (Element link : links) { 
      if (!link.text().isEmpty()) 
       System.out.println(link.text()); 

     } 
0

去除可能会非常棘手一些html标签总是空的,因为在<br/> </ img>等的情况下,

如果你可以决定你愿意的哪些元素删除,请尝试以下

// Names of the elements to remove if empty 
Set<String> ElementsRemove = .... 

// Parse the html into a jsoup document 
Document source = Jsoup.parse(myHtml); 

// Clean the html according to a whitelist 

Document cleaned = new Cleaner(whitelist).clean(source); 

// For each element in the cleaned document 
for(Element el: cleaned.getAllElements()) { 

if(el.children().isEmpty() && !el.hasText()) { 
    // Element is empty, check if should be removed 
    if(removable.contains(el.tagName())) el.remove(); 
    } 
} 

或改变OutputSettings

final String html = ...; 
OutputSettings settings = new OutputSettings(); 
settings.escapeMode(Entities.EscapeMode.xhtml); 
String cleanHtml = Jsoup.clean(html, "", Whitelist.relaxed(), settings); 

这可能是由Jsoup解析文档太:

Document doc = Jsoup.parse(...); 
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml); 
// ... 
+0

我更新了我的帖子 – nooby