使用正则表达式在html中获取标记代码

-1

我希望在html文档中获取下面的每个标记（包含在<>内的每个标记代码）。我已经尝试过/<.+>/，但它似乎不起作用。使用正则表达式在html中获取标记代码

<table class="body wrap" cellpadding="0" cellspacing="0" align="center" style="width: 100%;max-width: 600px;background-color: #f4f4f4;">

我该怎么做？

来源

2016-07-12 zonyang

你是什么意思，像下面的每个标签？标签的哪部分应该包含在匹配的内容中？ – 10100111001

得到整个

标记（在这种情况下）和大型html文档中的所有其他标记。 – zonyang

尝试'/ <[^<>] +> /'或更好'/ <.+?> /' – horcrux

回答

这应该工作。

import java.util.regex.Pattern; 
import java.util.regex.Matcher; 
public class HTMLTagMatcher 
{ 
    private static final String REGEX = "<[^\\/][^<>]+>"; 
    private static final String INPUT = "<test><blah /><test2></test><best><blargh></best><outside>"; 

    public static void main(String[] args) { 
    Pattern p = Pattern.compile(REGEX); 
    Matcher match = p.matcher(INPUT); 
    while (match.find()) { 
     System.out.println(match.group()); 
    } 
    } 
}

来源

2016-07-12 18:08:30 10100111001

使用正则表达式在html中获取标记代码

回答

相关问题