2014-07-18 100 views
0

我有以下的HTML字符串:解析字符串来获取内容

<h3>I only want this content</h3> I don't want this content <b>random content</b> 

而且我想只得到从H3标签的内容和删除其他内容。我有以下内容:

String getArticleBody = listArt.getChildText("body"); 
StringBuilder mainArticle = new StringBuilder(); 
String getSubHeadlineFromArticle; 

if(getArticleBody.startsWith("<h3>") && getArticleBody.endsWith("</h3>")){ 
    mainArticle.append(getSubHeadlineFromArticle); 
} 

但是,这返回了整个内容,这不是我所追求的。如果有人能帮助我,那将是非常感谢。

+0

你需要存储的内容。 –

+0

请参阅:http://stackoverflow.com/questions/16597303/extract-string-between-two-strings-in-java –

回答

0

您可以使用子方法是这样 -

String a="<h3>I only want this content</h3> I don't want this content <b>random content</b>"; 
System.out.println(a.substring(a.indexOf("<h3>")+4,a.indexOf("</h3>"))); 

输出 -

I only want this content 
0

与此

String result = getArticleBody.substring(getArticleBody.indexOf("<h3>"), getArticleBody.indexOf("</h3>")) 
       .replaceFirst("<h3>", ""); 
System.out.println(result); 
0

您需要使用正则表达式这样的尝试:

public static void main(String[] args) { 
    String str = "<h3>asdfsdafsdaf</h3>dsdafsdfsafsadfa<h3>second</h3>"; 
    // your pattern goes here 
    // ? is important since you need to catch the nearest closing tag 
    Pattern pattern = Pattern.compile("<h3>(.+?)</h3>"); 
    Matcher matcher = pattern.matcher(str); 
    while (matcher.find()) System.out.println(matcher.group(1)); 
} 

matcher.group(1)在h3标签之间返回完全文本。

0

使用正则表达式
它可以帮助你:

String str = "<h3>I only want this content</h3> I don't want this content <b>random content</b>"; 
final Pattern pattern = Pattern.compile("<h3>(.+?)</h3>"); 
final Matcher matcher = pattern.matcher(str); 
matcher.find(); 
System.out.println(matcher.group(1)); // Prints String I want to extract 

输出:

I only want this content 
1

谢谢,伙计们。你所有的答案都有效,但我最终使用了Jsoup。

String getArticleBody = listArt.getChildText("body"); 
org.jsoup.nodes.Document docc = Jsoup.parse(getArticleBody); 
org.jsoup.nodes.Element h3Tag = docc.getElementsByTag("h3").first(); 
String getSubHeadlineFromArticle = h3Tag.text(); 
0

其他答案已经涵盖了如何得到你想要的结果。我要评论你的代码,解释为什么它没有这样做。 (请注意,我修改您的变量的名字,因为字符串没有得到任何东西;他们的事情。)

// declare a bunch of variables 
String articleBody = listArt.getChildText("body"); 
StringBuilder mainArticle = new StringBuilder(); 
String subHeadlineFromArticle; 

// check to see if the article body consists entirely of a subheadline 
if(articleBody.startsWith("<h3>") && articleBody.endsWith("</h3>")){ 
    // if it does, append an empty string to the StringBuilder 
    mainArticle.append(subHeadlineFromArticle); 
} 
// if it doesn't, don't do anything 

// final result: 
// articleBody = the entire article body 
// mainArticle = empty StringBuilder (regardless of whether you appended anything) 
// subHeadlineFromArticle = empty string