我有几行HTML文件看起来像这样:删除<a>标签在othre标签的日中间
<div class="thumb tright">
<div class="thumbinner" style="width:302px;">
<a href="https://example.com/en/File:Tools_my_settings.png" class="image">
<img alt="" src="images_en/thumb/0/0a/tool_settings.png/9dd94c2d99eea9.png" width="300" height="110" class="thumbimage" srcset="/my/en/images_en/thumb/0/0a/my_settings.png/450px-my_settings.png 1.5x, /31/en/images_en/thumb/0/0a/my_settings.png/600px-my_settings.png 2x"/>
</a>
<div class="thumbcaption">
<div class="magnify">
<a href="https://example.com/en/File:Tools_my_settings.png" class="internal" title="Enlarge"></a>
</div>
Tool settings
</div>
</div>
</div>Tools Features - So Far
我需要删除以下href和紧接着的.png 2x"/>
文字对应的结束标记</a>
元件。
<a href="https://example.com/en/File:**Tools_my_settings.png" class="image">...</a>
末我需要的线看起来像这样:
<div class="thumb tright">
<div class="thumbinner" style="width:302px;">
<img alt="" src="images_en/thumb/0/0a/tool_settings.png/9dd94c2d99eea9.png" width="300" height="110" class="thumbimage" srcset="/my/en/images_en/thumb/0/0a/my_settings.png/450px-my_settings.png 1.5x, /31/en/images_en/thumb/0/0a/my_settings.png/600px-my_settings.png 2x"/>
<div class="thumbcaption">
<div class="magnify">
<a href="https://example.com/en/File:Tools_my_settings.png" class="internal" title="Enlarge"></a>
</div>
Tool settings
</div>
</div>
</div>Tools Features - So Far
所有文件包含相同的百通:<a href="https://choopy.com/en/File:
... 这是我曾尝试:
find /var/www/clients/client1/web2/web/lms_docs/ -type f -print0 | xargs -0 sed 's/<a\shref="https:\/\/choopy.com\/en\/File:([--:\[email protected]%&+~#=]*[a-z])\.png"\sclass="image">//g'
但它没有做任何事情,我不知道如何删除相应的结束标记</a>
你不想来代替'的https://choopy.com ......',不是吗?但是你的代码是为此而设计的。你应该用'https://example.com ...'删除链接,对吗? –
对不起,我修复了原来的帖子...... – James
股票建议:不要尝试使用像sed这样的面向行的工具来处理XML。改为使用'xmlstarlet'或'xsltproc'。 –