linux
  • perl
  • awk
  • sed
  • 2016-09-20 112 views 0 likes 
    0
    </tr> 
    <tr class='htmllist_tr' style="background-color:yellow" ><td class='htmllist_td' >INDX01</td> 
    <td class='htmllist_td_nbr' >964.87</td> 
    <td class='htmllist_td_nbr' >95.13</td> 
    <td class='htmllist_td' >NehaA9.86</td> 
    </tr> 
    <tr class='htmllist_tr' ><td class='htmllist_td' >UNDOTBS1</td> 
    <td class='htmllist_td_nbr' >156.25</td> 
    <td class='htmllist_td_nbr' >8</td> 
    <td class='htmllist_td' >NehaA5.12</td> 
    </tr> 
    

    <tr></tr>标签之间找到NehaA然后更改匹配模式和HTML标签替换

    `<tr class='htmllist_tr'>` 
    

    <tr class='htmllist_tr' style="background-color:yellow"> 
    

    `<tr class='htmllist_tr' style="background-color:red">` * 
    

    试过这种

    sed -e "/NehaA/ s/\'<tr class='htmllist_tr'>\'/\'<tr class='htmllist_tr' style="background-color:red">\'/ ;" 2932_TABLE2.txt 
    

    没有工作,请帮助

    +1

    在AWK做它/ sed的是不是最好的主意。你为什么不使用Python +美丽的汤?在perl中,你可以使用(我认为,我没有使用它)HTML :: Parser。 –

    +1

    HTML与XML一样是结构化数据。您不能将其视为普通的文本文件。有[可用的许多模块](http://search.cpan.org/search?m=module&q=html)将解析您的HTML并允许您修改它。 – Borodin

    回答

    1

    如果你不使用HTML解析器那就试试这个获得一个可用的答案:

    $ awk -v RS='</tr>\\s*' '/Neha/{ORS=RT; sub(/<tr[^>]+>/,""); print "<tr class=\047htmllist_tr\047 style=\"background-color:red\">" $0}' file 
    <tr class='htmllist_tr' style="background-color:red"><td class='htmllist_td' >INDX01</td> 
    <td class='htmllist_td_nbr' >964.87</td> 
    <td class='htmllist_td_nbr' >95.13</td> 
    <td class='htmllist_td' >NehaA9.86</td> 
    </tr> 
    <tr class='htmllist_tr' style="background-color:red"><td class='htmllist_td' >UNDOTBS1</td> 
    <td class='htmllist_td_nbr' >156.25</td> 
    <td class='htmllist_td_nbr' >8</td> 
    <td class='htmllist_td' >NehaA5.12</td> 
    </tr> 
    

    它使用GNU AWK多焦RS和RT。

    +0

    谢谢大多数只显示NehaA – Neha

    +0

    的记录和下一个 – Neha

    +0

    对于您发布的样本输入/输出非常重要,以真正反映您真实的样本输入/输出,因此当我们测试潜在的解决方案,我们可以一眼就知道它是否满足您的要求。 idk你的意思是'没有换行符和接下来的' - 你是说你的输入是错误的或者我的输出是错误的还是别的?请编辑您的问题以显示示例输入/输出以显示问题。 –

    0

    这是我使用HTML::TreeBuilder的方式。代码本身是自我解放的。我建议你阅读文档,因为不建议分析HTML using regex

    #!/usr/bin/perl 
    use strict; 
    use warnings; 
    use HTML::TreeBuilder; 
    
    my $str = <<'HTML' 
    <html> 
    <head> 
    </head> 
    <body> 
    <table> 
    <tr class='htmllist_tr' style="background-color:yellow" > 
    <td class='htmllist_td' >INDX01</td> 
    <td class='htmllist_td_nbr' >964.87</td> 
    <td class='htmllist_td_nbr' >95.13</td> 
    <td class='htmllist_td' >NehaA9.86</td> 
    </tr> 
    <tr class='htmllist_tr' > 
    <td class='htmllist_td' >UNDOTBS1</td> 
    <td class='htmllist_td_nbr' >156.25</td> 
    <td class='htmllist_td_nbr' >8</td> 
    <td class='htmllist_td' >NehaA5.12</td> 
    </tr> 
    </table> 
    </body> 
    </html> 
    HTML 
    ; 
    
    
    my $root = HTML::TreeBuilder->new_from_content($str); 
    
    my @tr = $root -> find_by_tag_name('tr'); 
    
    foreach (@tr) { 
        if ($_ -> find_by_attribute("class","htmllist_tr")) { 
         my @tds = $_ -> look_down(_tag => 'td', class => 'htmllist_td'); 
         my @children = map {$_ -> content_list} @tds; 
         if(grep(/NehaA/, @children)) { 
          $_ -> attr('style', 'background-color:red'); 
         } 
        } 
    } 
    
    print $root -> as_HTML(undef, " "); 
    
    +0

    谢谢Arunesh.I我从来没有用过HTML :: TreeBuilder会试试这个 – Neha

    0

    @ED ..sorry的cofusion ..this是原始文件

    <table class='htmllist'> 
    <tr class='htmllist_tr' ><th class='htmllist_th' >TABLESPACE<br>NAME</th> 
    <th class='htmllist_th' >ALLOCATED<br>SPACE<br>GB</th> 
    <th class='htmllist_th' >CURRENT<br>FREE<br>SPACE<br>GB</th> 
    <th class='htmllist_th' >CURRENT<br>FREE<br>SPACE<br>PCT</th> 
    <tr class='htmllist_tr' style="background-color:yellow" ><td class='htmllist_td' >INDX01</td> 
    <td class='htmllist_td_nbr' >964.87</td> 
    <td class='htmllist_td_nbr' >95.78</td> 
    <td class='htmllist_td' >NehaA9.93</td> 
    </tr> 
    <tr class='htmllist_tr' ><td class='htmllist_td' >TEMP</td> 
    <td class='htmllist_td_nbr' >125</td> 
    <td class='htmllist_td_nbr' >124.63</td> 
    <td class='htmllist_td_nbr' >99.7</td> 
    </tr> 
    <tr class='htmllist_tr' ><td class='htmllist_td' >TEMP_EDDDATA</td> 
    <td class='htmllist_td_nbr' >205.99</td> 
    <td class='htmllist_td_nbr' >198.52</td> 
    <td class='htmllist_td_nbr' >96.37</td> 
    </tr> 
    <tr class='htmllist_tr' ><td class='htmllist_td' >UNDOTBS1</td> 
    <td class='htmllist_td_nbr' >156.25</td> 
    <td class='htmllist_td_nbr' >22.85</td> 
    <td class='htmllist_td' >NehaA14.62</td> 
    </tr> 
    </table> 
    

    我要像

    <table class='htmllist'> 
    <tr class='htmllist_tr' ><th class='htmllist_th' >TABLESPACE<br>NAME</th> 
    <th class='htmllist_th' >ALLOCATED<br>SPACE<br>GB</th> 
    <th class='htmllist_th' >CURRENT<br>FREE<br>SPACE<br>GB</th> 
    <th class='htmllist_th' >CURRENT<br>FREE<br>SPACE<br>PCT</th> 
    <tr class='htmllist_tr' style="background-color:red" ><td class='htmllist_td' >INDX01</td> 
    <td class='htmllist_td_nbr' >964.87</td> 
    <td class='htmllist_td_nbr' >95.78</td> 
    <td class='htmllist_td' >NehaA9.93</td> 
    </tr> 
    <tr class='htmllist_tr' ><td class='htmllist_td' >TEMP</td> 
    <td class='htmllist_td_nbr' >125</td> 
    <td class='htmllist_td_nbr' >124.63</td> 
    <td class='htmllist_td_nbr' >99.7</td> 
    </tr> 
    <tr class='htmllist_tr' ><td class='htmllist_td' >TEMP_EDDDATA</td> 
    <td class='htmllist_td_nbr' >205.99</td> 
    <td class='htmllist_td_nbr' >198.52</td> 
    <td class='htmllist_td_nbr' >96.37</td> 
    </tr> 
    <tr class='htmllist_tr' style="background-color:red"><td class='htmllist_td' >UNDOTBS1</td> 
    <td class='htmllist_td_nbr' >156.25</td> 
    <td class='htmllist_td_nbr' >22.85</td> 
    <td class='htmllist_td' >NehaA14.62</td> 
    </tr> 
    </table> 
    
    然而

    输出,当我使用这个

    awk -v RS='</tr>\\s*' '/Neha/{ORS=RT; sub(/<tr[^>]+>/,""); print "<tr class=\047htmllist_tr\047 style=\"background-color:red\">" $0}' text.txt 
    

    这是给我这样的输出

    <tr class='htmllist_tr' style="background-color:red"><table class='htmllist'> 
    <th class='htmllist_th' >TABLESPACE<br>NAME</th> 
    <th class='htmllist_th' >ALLOCATED<br>SPACE<br>GB</th> 
    <th class='htmllist_th' >CURRENT<br>FREE<br>SPACE<br>GB</th> 
    <th class='htmllist_th' >CURRENT<br>FREE<br>SPACE<br>PCT</th> 
    <tr class='htmllist_tr' style="background-color:yellow" ><td class='htmllist_td' >INDX01</td> 
    <td class='htmllist_td_nbr' >964.87</td> 
    <td class='htmllist_td_nbr' >95.78</td> 
    <td class='htmllist_td' >NehaA9.93</td> 
    </tr><tr class='htmllist_tr' style="background-color:red"> 
    <td class='htmllist_td' >UNDOTBS1</td> 
    <td class='htmllist_td_nbr' >156.25</td> 
    <td class='htmllist_td_nbr' >22.85</td> 
    <td class='htmllist_td' >NehaA14.62</td> 
    </tr> 
    

    让我知道如果这有意义

    +1

    请将这些细节添加到你的问题而不是答案。其次,你可以尝试'HTML :: TreeBuilder'。它会给你,你的期望*输出*。用* regex *解析html不是一个好习惯;它会在某个地方破裂。请告诉我,如果您有任何疑问。 –

    相关问题