与正则表达式找到链接

我目前正在尝试学习Linux命令和正则表达式，我陷入了一个小问题，我试图使用sed和正则表达式在文件中找到一系列链接，任何人都可以帮助我工作这出了什么地方，我错了。链接是这样的与正则表达式找到链接

<a href="../a-lot-of-different/words-that/should-link.html">Useful links</a> 
<a href="..//a-lot-of-different/words-that/should-find-lots-of-links.html">Multiple links</a> 
<a href="../another-word-and-links/multiple-words/sjshfi-dfg.html">more links</a>

这就是我所拥有的。

sed -n '/<a*href=”^[../"]*\([a-z]*\)^[.html](["]*\)/p' /file > newfile

来源

2014-10-29 knowlage

如果它是一个HTML文件，我建议使用DOM解析器。请参阅http://unix.stackexchange.com/questions/6389/parse-html-on-linux和http://stackoverflow.com/questions/893585/how-to-parse-xml-in-bash – Phil 2014-10-29 23:31:32

正则表达式对解析HTML并不理想。

你没有显示你想要的输出。我猜你想要提取链接。如果是这样，请尝试：

$ sed -rn 's/.*<a\s+href="([^"]*)".*/\1/p' file 
../a-lot-of-different/words-that/should-link.html 
..//a-lot-of-different/words-that/should-find-lots-of-links.html 
../another-word-and-links/multiple-words/sjshfi-dfg.html

工作原理：

.*<a\s+href="

此链接匹配之前的一切。
([^"]*)

此相匹配的链接，它捕捉到组\1。
".*

此行和随后的一切后双引号匹配。

来源

2014-10-29 23:44:54 John1024

谢谢你的这使得它更加清晰，并且找到了我正在寻找的其中一个链接。 – knowlage 2014-10-30 00:37:05

锚标签包含href标签，所以搜索href就能解决问题

sed -n '/href=".*"/p' link_file.txt

来源

2014-10-29 23:52:11 Hackaholic

与正则表达式找到链接

回答

相关问题