2010-07-22 130 views
0
count_items=`curl -u username:password -L "websitelink" | sed -e 's/<\/title>/<\/title>\n/g' | sed -n -e 's/.*<title>\(.*\)<\/title>.*/\1/p' | wc -l` 

上面我有一个Bash脚本,它从XML文件中提取标题,但是如何更改正则表达式以便从div标签中提取标题名称?Bash脚本sed -e

例子:提取出题的:<div id="example""><a href="">title</a></div>

我知道这是愚蠢的,可以通过猛砸做,但我别无选择,任何帮助,将不胜感激。

+0

您是否想过用

替换您的东西,看它是否有效? – <span class="text-secondary"> <small> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/395939/">Scharron</a></span> <span>2010-07-22 10:51:03</span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">不,因为有很多标题,我不知道标题名称是什么,它需要动态地收集标题名称:( – <span class="text-secondary"> <small> <span>2010-07-22 10:57:00</span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">你需要给我们一个提示,你如何区分“标题”DIV与其他所有的标题都在链接中,即'href's?你的'href's总是被包含在一行吗? – <span class="text-secondary"> <small> <span>2010-07-22 11:40:35</span> </small> </span> </p> </div> </div> </div> </div> </div> </article> </div> <div class="answer-title"> <span class="text-logo margin-top-sm">A</span> <h2 class="title h4">回答</h2> </div> <div class="item-description text-md markdown-body margin-bottom-40 voidso"> <article class="board-top-1 padding-top-10"> <div class="post-col vote-info"> <span class="count">3<i class="fa fa-thumbs-up"></i></span> </div> <div class="post-offset"> <div class="answer fmt"> <p>我推荐使用<a href="http://xmlstar.sourceforge.net/" rel="nofollow noreferrer">xmlstarlet</a>而不是尝试用正则表达式来解析XML。</p> </div> <div class="post-info"> <div class="post-meta row"> <p class="text-secondary col-lg-6"> <span class="source"> <a rel="noopener" target="_blank" href="https://stackoverflow.com/q/3308015">来源</a> </span> </p> <p class="text-secondary col-lg-6"> <span class="float-right date"> <span>2010-07-22 10:50:52</span> </p> <p class="col-12"></p> <p class="col-12"></p></div> </div> <!-- comments --> <div class="comments"> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+1</span></div> <div class="col-lg-11"> <p class="commenttext">即时通讯不解析xml即时提取,我必须使用bash。任何帮助将不胜感激! – <span class="text-secondary"> <small> <span>2010-07-22 10:51:48</span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+1</span></div> <div class="col-lg-11"> <p class="commenttext">提取需要解析,而xmlstarlet是一个命令行工具。 – <span class="text-secondary"> <small> <span>2010-07-22 11:04:04</span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">是的,但它没有安装到Linux机器默认是它,我需要使用一个简单的bash脚本,不需要任何东西安装 – <span class="text-secondary"> <small> <span>2010-07-22 11:07:07</span> </small> </span> </p> </div> </div> </div> </div> </div> </article> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-6208739752673518" data-ad-slot="1038284119" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <article class="board-top-1 padding-top-10"> <div class="post-col vote-info"> <span class="count">0<i class="fa fa-thumbs-up"></i></span> </div> <div class="post-offset"> <div class="answer fmt"> <p>只为给定的单行例如:</p> <pre><code class="prettyprint-override">echo '<div id="example""><a href="">title</a></div>' | sed -E -n 's/(.*<div.*<a href="">)([^<]*)(<.*<\/div>.*)/\2/p' </code></pre> </div> <div class="post-info"> <div class="post-meta row"> <p class="text-secondary col-lg-6"> <span class="source"> <a rel="noopener" target="_blank" href="https://stackoverflow.com/q/3308361">来源</a> </span> </p> <p class="text-secondary col-lg-6"> <span class="float-right date"> <span>2010-07-22 11:34:43</span> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/399017/">creek</a></span> </p> <p class="col-12"></p> <p class="col-12"></p></div> </div> </div> </article> <article class="board-top-1 padding-top-10"> <div class="post-col vote-info"> <span class="count">2<i class="fa fa-thumbs-up"></i></span> </div> <div class="post-offset"> <div class="answer fmt"> <p>解析XML没有解析器难看; SO人群总是强烈反对,人们总是坚持这样做。通常情况下,暴力破解,特殊案例解决方案与错误工具混杂在一起会超出一定的复杂程度,然后这些人又回到了他们开始的地方。你被警告了! ;)</p> <p>你在别处提到你需要能够在“没有安装任何东西的普通Linux机器上”这样做。虽然在每个Linux机器上都找不到专门的XML解析工具,但现在很难找到没有安装Perl的解析工具。或者至少awk。当你在sed中使用正则表达式所能达到的极限时,我建议使用awk或perl来获得干净,灵活和清晰的解决方案。将Perl与“真正的”Perl XML库一起使用将是最佳选择,但即使是“开箱即用”的Perl,仍然可以用很多方法完成。</p> </div> <div class="post-info"> <div class="post-meta row"> <p class="text-secondary col-lg-6"> <span class="source"> <a rel="noopener" target="_blank" href="https://stackoverflow.com/q/3308454">来源</a> </span> </p> <p class="text-secondary col-lg-6"> <span class="float-right date"> <span>2010-07-22 11:46:46</span> </p> <p class="col-12"></p> <p class="col-12"></p></div> </div> </div> </article> <article class="board-top-1 padding-top-10"> <div class="post-col vote-info"> <span class="count">0<i class="fa fa-thumbs-up"></i></span> </div> <div class="post-offset"> <div class="answer fmt"> <p>使用无非击:</p> <pre><code class="prettyprint-override">$ string='<div id="example""><a href="">title</a></div>' $ pattern='.*>([^<]+)<.*' $ [[ $string =~ $pattern ]] $ target=${BASH_REMATCH[1]} $ echo $target title </code></pre> <p>有很多方法让这个失败。这里有一个:</p> <pre><code class="prettyprint-override">$ string='<div id="example""><a href="">title</a>this text will be grabbed instead</div>' </code></pre> <p>你可以不断尝试,使正则表达式更加健壮:</p> <pre><code class="prettyprint-override">pattern='.*>([^<]+)</a.*' </code></pre> <p>但它是一场艰苦的战斗。使用合适的解析器。</p> </div> <div class="post-info"> <div class="post-meta row"> <p class="text-secondary col-lg-6"> <span class="source"> <a rel="noopener" target="_blank" href="https://stackoverflow.com/q/3310278">来源</a> </span> </p> <p class="text-secondary col-lg-6"> <span class="float-right date"> <span>2010-07-22 14:59:08</span> </p> <p class="col-12"></p> <p class="col-12"></p></div> </div> </div> </article> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-6208739752673518" data-ad-slot="1038284119" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> </div> <div class="clearfix"> </div> <div class="relative-box"> <div class="relative">相关问题</div> <ul class="relative_list"> <li> 1. <a href="http://www.uwenku.com/question/p-pcqwqanl-qc.html" target="_blank" title="与sed的Bash脚本"> 与sed的Bash脚本 </a> </li> <li> 2. <a href="http://www.uwenku.com/question/p-mglqjana-vu.html" target="_blank" title="bash脚本中的sed regexp"> bash脚本中的sed regexp </a> </li> <li> 3. <a href="http://www.uwenku.com/question/p-uykxvjaq-mk.html" target="_blank" title="Bash脚本和引用SED"> Bash脚本和引用SED </a> </li> <li> 4. <a href="http://www.uwenku.com/question/p-hgbmhjuu-bak.html" target="_blank" title="错误SED在bash脚本"> 错误SED在bash脚本 </a> </li> <li> 5. <a href="http://www.uwenku.com/question/p-waomqjkk-uk.html" target="_blank" title="SED bash脚本援助"> SED bash脚本援助 </a> </li> <li> 6. <a href="http://www.uwenku.com/question/p-knkwhuxo-ku.html" target="_blank" title="bash脚本中的sed脚本"> bash脚本中的sed脚本 </a> </li> <li> 7. <a href="http://www.uwenku.com/question/p-dbgrsjsc-me.html" target="_blank" title="bash脚本:在文件““使用SED"> bash脚本:在文件““使用SED </a> </li> <li> 8. <a href="http://www.uwenku.com/question/p-xprlvsol-cw.html" target="_blank" title="Bash转义脚本和sed捕获组"> Bash转义脚本和sed捕获组 </a> </li> <li> 9. <a href="http://www.uwenku.com/question/p-fqkahduc-ek.html" target="_blank" title="Sed在bash脚本里面不工作"> Sed在bash脚本里面不工作 </a> </li> <li> 10. <a href="http://www.uwenku.com/question/p-ouamrrrh-sq.html" target="_blank" title="在bash脚本中执行sed命令"> 在bash脚本中执行sed命令 </a> </li> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-6208739752673518" data-ad-slot="4606349252"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <li> 11. <a href="http://www.uwenku.com/question/p-huayjusu-mw.html" target="_blank" title="bash:使用-e(errexit)运行脚本"> bash:使用-e(errexit)运行脚本 </a> </li> <li> 12. <a href="http://www.uwenku.com/question/p-mxjlfrfk-zw.html" target="_blank" title="在bash脚本中使用sed"> 在bash脚本中使用sed </a> </li> <li> 13. <a href="http://www.uwenku.com/question/p-qquptgdy-bbz.html" target="_blank" title="发现在bash脚本sed的问题"> 发现在bash脚本sed的问题 </a> </li> <li> 14. <a href="http://www.uwenku.com/question/p-ndjpivzc-tw.html" target="_blank" title="关于php脚本的sed bash命令"> 关于php脚本的sed bash命令 </a> </li> <li> 15. <a href="http://www.uwenku.com/question/p-bnnarjmj-tr.html" target="_blank" title="bash脚本中的“sed/awk”值分配"> bash脚本中的“sed/awk”值分配 </a> </li> <li> 16. <a href="http://www.uwenku.com/question/p-thrveffg-bgr.html" target="_blank" title="使用SED和nextline的Bash脚本"> 使用SED和nextline的Bash脚本 </a> </li> <li> 17. <a href="http://www.uwenku.com/question/p-wgdfjdri-wm.html" target="_blank" title="缓慢bash脚本使用grep和sed"> 缓慢bash脚本使用grep和sed </a> </li> <li> 18. <a href="http://www.uwenku.com/question/p-zqquyurn-bnt.html" target="_blank" title="如何在嵌入bash脚本的sed脚本中转义反斜杠"> 如何在嵌入bash脚本的sed脚本中转义反斜杠 </a> </li> <li> 19. <a href="http://www.uwenku.com/question/p-buaiikwt-bcw.html" target="_blank" title="使用sed -e"> 使用sed -e </a> </li> <li> 20. <a href="http://www.uwenku.com/question/p-rjxydnqv-bgw.html" target="_blank" title="sed in perl脚本"> sed in perl脚本 </a> </li> <li> 21. <a href="http://www.uwenku.com/question/p-sifsjqcx-ber.html" target="_blank" title="在bash脚本中转义sed字符串"> 在bash脚本中转义sed字符串 </a> </li> <li> 22. <a href="http://www.uwenku.com/question/p-svthvesd-cz.html" target="_blank" title="bash脚本使用sed将删除特定行不起作用"> bash脚本使用sed将删除特定行不起作用 </a> </li> <li> 23. <a href="http://www.uwenku.com/question/p-evzryjdc-kq.html" target="_blank" title="sed命令工作直列而不是内部的bash脚本"> sed命令工作直列而不是内部的bash脚本 </a> </li> <li> 24. <a href="http://www.uwenku.com/question/p-epbmuxla-bbp.html" target="_blank" title="由sed命令在bash脚本中编辑文件"> 由sed命令在bash脚本中编辑文件 </a> </li> <li> 25. <a href="http://www.uwenku.com/question/p-cpigdexw-bdo.html" target="_blank" title="引号和\在Bash脚本中使用sed命令"> 引号和\在Bash脚本中使用sed命令 </a> </li> <li> 26. <a href="http://www.uwenku.com/question/p-ncdbxxsx-bb.html" target="_blank" title="sed在bash脚本中无法正常工作"> sed在bash脚本中无法正常工作 </a> </li> <li> 27. <a href="http://www.uwenku.com/question/p-uodlfzco-qe.html" target="_blank" title="Sed删除大括号之间的bash脚本中的行"> Sed删除大括号之间的bash脚本中的行 </a> </li> <li> 28. <a href="http://www.uwenku.com/question/p-rihmsxwb-bnz.html" target="_blank" title="如何用bash脚本中的sed查找替换值"> 如何用bash脚本中的sed查找替换值 </a> </li> <li> 29. <a href="http://www.uwenku.com/question/p-cagzefuv-mr.html" target="_blank" title="在没有SED的Bash脚本文件中添加文本"> 在没有SED的Bash脚本文件中添加文本 </a> </li> <li> 30. <a href="http://www.uwenku.com/question/p-tgizgpqm-bmn.html" target="_blank" title="Bash脚本如何使用sed替换文本后的数字?"> Bash脚本如何使用sed替换文本后的数字? </a> </li> </ul> </div> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-6208739752673518" data-ad-slot="1575177025"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="padding-top-10"></div> </div> </div> <script type="text/javascript" src="http://img.uwenku.com/uwenku/script/side.js?t=1644592048176"></script> <script type="text/javascript" src="http://img.uwenku.com/uwenku/plugin/highlight/highlight.pack.js"></script> <link href="http://img.uwenku.com/uwenku/plugin/highlight/styles/docco.css" media="screen" rel="stylesheet" type="text/css" /> <script type="text/javascript"> $('pre').each(function(i, e) { hljs.highlightBlock(e, "<span class='indent'> </span>", false) }); </script> <div class="col-lg-3 col-md-4 col-sm-5"> <div id="rightTop"> <div class="row sidebar panel panel-default"> <div class="panel-heading font-bold"> 每日一句 </div> <div class="panel-body m-b-sm m-t-sm clearfix"> 每一个你不满意的现在,都有一个你没有努力的曾经。 </div> </div> <div class="row"> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-6208739752673518" data-ad-slot="5415218910" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="row sidebar panel panel-default"> <div class="panel-heading font-bold"> 最新问题 </div> <div class="m-b-sm m-t-sm clearfix"> <ul class="side_article_list"> <li class="side_article_list_item"> 1. <a href="http://www.uwenku.com/question/p-rqxgwvjb-boc.html" target="_blank" title="jsoup关键词搜索元素"> jsoup关键词搜索元素 </a> </li> <li class="side_article_list_item"> 2. <a href="http://www.uwenku.com/question/p-sdcgfdmj-bnw.html" target="_blank" title="是否可以访问键盘控制器的内部制作/中断代码缓冲区?"> 是否可以访问键盘控制器的内部制作/中断代码缓冲区? </a> </li> <li class="side_article_list_item"> 3. <a href="http://www.uwenku.com/question/p-eozcysza-bnh.html" target="_blank" title="中级和根证书文件"> 中级和根证书文件 </a> </li> <li class="side_article_list_item"> 4. <a href="http://www.uwenku.com/question/p-scqrbktb-bcw.html" target="_blank" title="如何仅从模式"> 如何仅从模式 </a> </li> <li class="side_article_list_item"> 5. <a href="http://www.uwenku.com/question/p-autdoosr-bdc.html" target="_blank" title="Microsoft Visual Studio:Windows和Unix项目源代码兼容性"> Microsoft Visual Studio:Windows和Unix项目源代码兼容性 </a> </li> <li class="side_article_list_item"> 6. <a href="http://www.uwenku.com/question/p-gcyqntpn-bdn.html" target="_blank" title="如何检查iPhone应用程序的网络活动?"> 如何检查iPhone应用程序的网络活动? </a> </li> <li class="side_article_list_item"> 7. <a href="http://www.uwenku.com/question/p-tgecbmvw-bhe.html" target="_blank" title="嵌套模块的声明文件"> 嵌套模块的声明文件 </a> </li> <li class="side_article_list_item"> 8. <a href="http://www.uwenku.com/question/p-cpuqxltm-bgy.html" target="_blank" title="庆幸,扩展没有被加载"> 庆幸,扩展没有被加载 </a> </li> <li class="side_article_list_item"> 9. <a href="http://www.uwenku.com/question/p-gxydgsqz-bkk.html" target="_blank" title="关于在magento 2中添加CSS"> 关于在magento 2中添加CSS </a> </li> <li class="side_article_list_item"> 10. <a href="http://www.uwenku.com/question/p-ayjpsizo-bkb.html" target="_blank" title="如何在Mac上找到Android SDK管理器路径"> 如何在Mac上找到Android SDK管理器路径 </a> </li> </ul> </div> </div> </div> <p class="article-nav-bar"></p> <div class="row sidebar article-nav"> <div class="row box_white visible-sm visible-md visible-lg margin-zero"> <div class="top"> <h3 class="title"><i class="glyphicon glyphicon-th-list"></i> 相关问题</h3> </div> <div class="article-relative-content"> <ul class="side_article_list"> <li class="side_article_list_item"> 1. <a href="http://www.uwenku.com/question/p-pcqwqanl-qc.html" target="_blank" title="与sed的Bash脚本"> 与sed的Bash脚本 </a> </li> <li class="side_article_list_item"> 2. <a href="http://www.uwenku.com/question/p-mglqjana-vu.html" target="_blank" title="bash脚本中的sed regexp"> bash脚本中的sed regexp </a> </li> <li class="side_article_list_item"> 3. <a href="http://www.uwenku.com/question/p-uykxvjaq-mk.html" target="_blank" title="Bash脚本和引用SED"> Bash脚本和引用SED </a> </li> <li class="side_article_list_item"> 4. <a href="http://www.uwenku.com/question/p-hgbmhjuu-bak.html" target="_blank" title="错误SED在bash脚本"> 错误SED在bash脚本 </a> </li> <li class="side_article_list_item"> 5. <a href="http://www.uwenku.com/question/p-waomqjkk-uk.html" target="_blank" title="SED bash脚本援助"> SED bash脚本援助 </a> </li> <li class="side_article_list_item"> 6. <a href="http://www.uwenku.com/question/p-knkwhuxo-ku.html" target="_blank" title="bash脚本中的sed脚本"> bash脚本中的sed脚本 </a> </li> <li class="side_article_list_item"> 7. <a href="http://www.uwenku.com/question/p-dbgrsjsc-me.html" target="_blank" title="bash脚本:在文件““使用SED"> bash脚本:在文件““使用SED </a> </li> <li class="side_article_list_item"> 8. <a href="http://www.uwenku.com/question/p-xprlvsol-cw.html" target="_blank" title="Bash转义脚本和sed捕获组"> Bash转义脚本和sed捕获组 </a> </li> <li class="side_article_list_item"> 9. <a href="http://www.uwenku.com/question/p-fqkahduc-ek.html" target="_blank" title="Sed在bash脚本里面不工作"> Sed在bash脚本里面不工作 </a> </li> <li class="side_article_list_item"> 10. <a href="http://www.uwenku.com/question/p-ouamrrrh-sq.html" target="_blank" title="在bash脚本中执行sed命令"> 在bash脚本中执行sed命令 </a> </li> </ul> </div> </div> </div> </div> </div> </div> </div><!-- wrap end--> <!-- footer --> <footer id="footer"> <div class="bg-simple lt"> <div class="container"> <div class="row padder-v m-t"> <div class="col-xs-8"> <ul class="list-inline"> <li><a href="http://www.uwenku.com/contact">联系我们</a></li> <li>© 2020 UWENKU.COM</li> <li><a target="_blank" href="https://beian.miit.gov.cn/">沪ICP备13005482号-4</a></li> <li><script type="text/javascript" src="https://v1.cnzz.com/z_stat.php?id=1280101193&web_id=1280101193"></script></li> <li><a href="http://www.uwenku.com/" target="_blank" title="优文库">简体中文</a></li> <li><a href="http://hk.uwenku.com/" target="_blank" title="優文庫">繁體中文</a></li> <li><a href="http://ru.uwenku.com/" target="_blank" title="поле вопросов и ответов">Русский</a></li> <li><a href="http://de.uwenku.com/" target="_blank" title="Frage - und - antwort - Park">Deutsch</a></li> <li><a href="http://es.uwenku.com/" target="_blank" title="Preguntas y respuestas">Español</a></li> <li><a href="http://hi.uwenku.com/" target="_blank" title="कार्यक्रम प्रश्न और उत्तर पार्क">हिन्दी</a></li> <li><a href="http://it.uwenku.com/" target="_blank" title="IL Programma di chiedere Park">Italiano</a></li> <li><a href="http://ja.uwenku.com/" target="_blank" title="プログラム問答園区">日本語</a></li> <li><a href="http://ko.uwenku.com/" target="_blank" title="프로그램 문답 단지">한국어</a></li> <li><a href="http://pl.uwenku.com/" target="_blank" title="program o park">Polski</a></li> <li><a href="http://tr.uwenku.com/" target="_blank" title="Program soru ve cevap parkı">Türkçe</a></li> <li><a href="http://vi.uwenku.com/" target="_blank" title="Đáp ứng viên">Tiếng Việt</a></li> <li><a href="http://fr.uwenku.com/" target="_blank" title="Programme interrogation Park">Française</a></li> </ul> </div> </div> </div> </div> </div> </footer> <!-- / footer --> <script> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?f78a970f17b19a79fc477a3378096f29"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> </body> </html>