的java jsoup除去新线

 for (int x = 0; x < 8000; x += 50) { 
      Document doc = Jsoup.connect("localhost.com/"+x).get(); 
      Elements links = doc.select("a[href]"); 
      for (Element link: links) { 
       String text = link.text(); 
       System.out.println(text); 
      } 

     } 
    } 
}

这将使像这样的输出：的java jsoup除去新线

Adrian Riven 

HalfSugar No Ice 

Yassuo 

Amandadog 

P1 Sloosh

反正是有删除空行？所以它会看起来像这样的输出：

Adrian Riven 
HalfSugar No Ice 
Yassuo 
Amandadog 
P1 Sloosh

我试过 text.replace（ “\ n”， “”）; text.replaceAll（ “\ r \ n吗？”， “”）

编辑这样的，这并没有为我工作没有尝试另一种

Elements links = doc.select("a[href]"); 
     for (Element link: links) { 
      Document docs = Jsoup.parse(String.valueOf(links)); 
      docs.outputSettings().escapeMode(Entities.EscapeMode.xhtml); 
      String text = link.text()+link.text(); 
      System.out.println(text.replace("Show More", ""));

HTML示例：

</td> 
    <td class="SummonerName Cell"> 
     <a href="/summoner/userName=Cris" class="Link">Cris</a> 
    </td> 
       <td class="TierRank Cell">Challenger</td> 
     <td class="LP Cell">1,137 LP</td> 
      <td class="TeamName Cell"> 
         Apex Gaming 
       </td> 
    <td class="RatioGraph Cell"> 
         <div class="WinRatioGraph"> 
       <div class="Graph">

来源

2016-08-30 nooby

尝试使用System.out.print（文本），它在文本中有换行符 – ravthiru

你可以把你的示例html代码？ – soorapadman

是的，我可以..添加。 – nooby

这招对我的作品：

Document doc = Jsoup.connect("localhost.com").get(); 

     Elements links = doc.select("a[href]"); 
     for (Element link : links) { 
      if (!link.text().isEmpty()) 
       System.out.println(link.text()); 

     }

来源

2016-08-31 07:35:47 soorapadman

去除可能会非常棘手一些html标签总是空的，因为在<br/> </ img>等的情况下，

如果你可以决定你愿意的哪些元素删除，请尝试以下

// Names of the elements to remove if empty 
Set<String> ElementsRemove = .... 

// Parse the html into a jsoup document 
Document source = Jsoup.parse(myHtml); 

// Clean the html according to a whitelist 

Document cleaned = new Cleaner(whitelist).clean(source); 

// For each element in the cleaned document 
for(Element el: cleaned.getAllElements()) { 

if(el.children().isEmpty() && !el.hasText()) { 
    // Element is empty, check if should be removed 
    if(removable.contains(el.tagName())) el.remove(); 
    } 
}

或改变OutputSettings

final String html = ...; 
OutputSettings settings = new OutputSettings(); 
settings.escapeMode(Entities.EscapeMode.xhtml); 
String cleanHtml = Jsoup.clean(html, "", Whitelist.relaxed(), settings);

这可能是由Jsoup解析文档太：

Document doc = Jsoup.parse(...); 
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml); 
// ...

来源

2016-08-31 00:14:29

我更新了我的帖子 – nooby

的java jsoup除去新线

回答

相关问题