查找使用DOM

我有一些纯文本/ HTML内容，像这样的线路/文本的HTML字符串：查找使用DOM

Title: Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
Snippet: Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
Category: Lorem ipsum dolor sit amet, consectetur adipiscing elit.

，我想只匹配在那里说行“段：和文本，但只有在该行，没有别的，也使搜索不区分大小写。我试过正则表达式，但最终我想尝试使用DOMDocument现在，我该怎么做？

来源

2012-05-07 Tower

可能重复http://stackoverflow.com/questions/8193327/ignore-html-tags-in-preg-replace） - 看到那里的TextRange类，它提供的字符串与pcre和UTF-8'u'-modifier兼容。 – hakre

我不知道你的问题的一些细节，所以我的回答可能不合适。你可以根据你需要解析的内容的大小来决定这不是一个选项。另外，从这个问题来看，不清楚html内容到位的地方，这就是为什么我写这个解决方案不使用DOM解析。

一个可能的解决方案可能是获取您想要在数组中解析的行。之后，您可以过滤数组，从结果中删除不符合规则的行。

的样本是：

//this is the content 
$text = 'Title: Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
Snippet: Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
Category: Lorem ipsum dolor sit amet, consectetur adipiscing elit.'; 

//get the lines from your input as an array.. you could acheive this in a different way if, for example, you are reading from a file 
$lines = explode(PHP_EOL, $text); 

// apply a cusom function to filter the lines (remove the ones that don't match your rule) 
$results = array_filter($lines, 'test_content'); 

//show the results 
echo '<pre>'; 
print_r($results); 
echo '</pre>'; 

//custom function here: 
function test_content($line) 
{ 
    //case insensitive search, notice stripos; 
    // type strict comparison to be sure that it doesn't fail when the element is found right at the start 
    if (false !== stripos($line, 'Snippet')) 
    { 
     return true; 
    } 
    return false;//these lines will be removed 
}

这段代码将只有一个元素的$结果数组中返回，第二行

，你可以在工作中看到它在这里：http://codepad.org/220BLjEk

来源

2012-05-07 15:40:42 mishu

我要试一试，让你知道它是怎么回事。 – Tower

谢谢，这工作很好！ – Tower

@Tower ok，很高兴帮助 – mishu

如果涉及DOM，请参阅评论中链接的重复I。

否则，你可能只是寻找一个正则表达式：

$line = preg_match('~(^Snippet:.*$)~m', $text, $matches) ? $matches[1] : NULL;

Demo和正则表达式解释：

~ -- delimiter 
( -- start match group 1 
^-- start of line 
    Snippet: -- exactly this text 
    .* -- match all but newline 
    $ -- end of line 
) -- end match group 1 
~ -- delimiter 
m -- multiline modifier (^ matches begin of line, $ end of line)

[中的preg_replace忽略HTML标签（的

来源

2012-05-07 16:23:59 hakre

感谢您的信息！ – Tower

回答

相关问题