2013-03-01 26 views
0

给定一个url,以下正则表达式可以在URL中的某些点处插入/替换单词。Perl正则表达式在特定位置插入/替换字符串

代码:

#!/usr/bin/perl 

use strict; 
use warnings; 
#use diagnostics; 

my @insert_words = qw/HELLO GOODBYE/; 
my $word = 0; 
my $match; 

while (<DATA>) { 
    chomp; 
    foreach my $word (@insert_words) 
    { 
     my $repeat = 1; 
     while ((my $match=$_) =~ s|(?<![/])(?:[/](?![/])[^/]*){$repeat}[^/]*\K|$word|) 
     { 
      print "$match\n"; 
      $repeat++; 
     } 

    print "\n"; 
    } 
} 

__DATA__ 
http://www.stackoverflow.com/dog/cat/rabbit/ 
http://www.superuser.co.uk/dog/cat/rabbit/hamster/ 
10.15.16.17/dog/cat/rabbit/ 

(在__DATA__HELLO单词的第一个例子URL)给出的输出:

http://www.stackoverflow.com/dogHELLO/cat/rabbit/ 
http://www.stackoverflow.com/dog/catHELLO/rabbit/ 
http://www.stackoverflow.com/dog/cat/rabbitHELLO/ 
http://www.stackoverflow.com/dog/cat/rabbit/HELLO 

在哪里,我现在坚持:

我现在想改变正则表达式锡安使输出将是什么样子如下图所示:

http://www.stackoverflow.com/dogHELLO/cat/rabbit/ 
http://www.stackoverflow.com/dog/catHELLO/rabbit/ 
http://www.stackoverflow.com/dog/cat/rabbitHELLO/ 
http://www.stackoverflow.com/dog/cat/rabbit/HELLO 
#above is what it already does at the moment 
#below is what i also want it to be able to do as well 
http://www.stackoverflow.com/HELLOdog/cat/rabbit/ #<-puts the word at the start of the string 
http://www.stackoverflow.com/dog/HELLOcat/rabbit/ 
http://www.stackoverflow.com/dog/cat/HELLOrabbit/ 
http://www.stackoverflow.com/dog/cat/rabbit/HELLO 
http://www.stackoverflow.com/HELLO/cat/rabbit/ #<- now also replaces the string with the word 
http://www.stackoverflow.com/dog/HELLO/rabbit/ 
http://www.stackoverflow.com/dog/cat/HELLO/ 
http://www.stackoverflow.com/dog/cat/rabbit/HELLO 

但我无法得到它的一个正则表达式中自动执行此操作。

这件事的任何帮助,将不胜感激,非常感谢

+1

你的意思是把'/ dog/cat/ra bbit/HELLO'两次? – ikegami 2013-03-01 16:34:41

+0

@ikegami - 很好的问题,我希望它不会重复,我把它留在问题中,让其他人可以理解我想要更容易实现的输出类型,谢谢 – 2013-03-01 16:40:02

+1

**这可能不是正则表达式的工作,而是使用您选择语言的现有工具。**您使用的是哪种语言?您可能不想使用正则表达式,而是使用已编写,测试和调试的现有模块。 如果您使用PHP,您需要['parse_url'](http://php.net/manual/en/function.parse-url.php)函数。 如果您使用Perl,您需要['URI'](http://search.cpan.org/dist/URI/)模块。 如果您使用的是Ruby,请使用['URI'](http://www.ruby-doc.org/stdlib-1.9.3/libdoc/uri/rdoc/URI.html)模块。 – 2013-03-01 17:27:49

回答

1

一个解决方案:

use strict; 
use warnings; 

use URI qw(); 

my @insert_words = qw(HELLO); 

while (<DATA>) { 
    chomp; 
    my $url = URI->new($_); 
    my $path = $url->path(); 

    for (@insert_words) { 
     # Use package vars to communicate with /(?{})/ blocks. 
     local our $insert_word = $_; 
     local our @paths; 
     $path =~ m{ 
     ^(.*/)([^/]*)((?:/.*)?)\z 
     (?{ 
      push @paths, "$1$insert_word$2$3"; 
      if (length($2)) { 
       push @paths, "$1$insert_word$3"; 
       push @paths, "$1$2$insert_word$3"; 
      } 
     }) 
     (?!) 
     }x; 

     for (@paths) { 
     $url->path($_); 
     print "$url\n"; 
     } 
    } 
} 

__DATA__ 
http://www.stackoverflow.com/dog/cat/rabbit/ 
http://www.superuser.co.uk/dog/cat/rabbit/hamster/ 
http://10.15.16.17/dog/cat/rabbit/ 
+0

优秀的解决方案,谢谢 – 2013-03-05 14:18:42

1

没有疯狂的正则表达式:

use strict; 
use warnings; 

use URI qw(); 

my @insert_words = qw(HELLO); 

while (<DATA>) { 
    chomp; 
    my $url = URI->new($_); 
    my $path = $url->path(); 

    for my $insert_word (@insert_words) { 
     my @parts = $path =~ m{/([^/]*)}g; 
     my @paths; 
     for my $part_idx (0..$#parts) { 
     my $orig_part = $parts[$part_idx]; 
     local $parts[$part_idx]; 
     { 
      $parts[$part_idx] = $insert_word . $orig_part; 
      push @paths, join '', map "/$_", @parts; 
     } 
     if (length($orig_part)) { 
      { 
       $parts[$part_idx] = $insert_word; 
       push @paths, join '', map "/$_", @parts; 
      } 
      { 
       $parts[$part_idx] = $orig_part . $insert_word; 
       push @paths, join '', map "/$_", @parts; 
      } 
     } 
     } 

     for (@paths) { 
     $url->path($_); 
     print "$url\n"; 
     } 
    } 
} 

__DATA__ 
http://www.stackoverflow.com/dog/cat/rabbit/ 
http://www.superuser.co.uk/dog/cat/rabbit/hamster/ 
http://10.15.16.17/dog/cat/rabbit/ 
+0

好主意摆脱这个解决方案的正则表达式,谢谢,它会让我的生活在我的程序的其他部分更加轻松。 – 2013-03-02 16:30:20

+0

不知道哪个更快,如果这很关键。 – ikegami 2013-03-02 16:33:19

+0

我知道我需要将正则表达式更改为'my @parts = $ path =〜m {[/ =&]([^/= &]*)}g;'让它通过我指定的其他字符(/ =& ,而不仅仅是斜杠。但是我不知道接下来要改变什么,因为'map'/ $ _“,@parts;'显然总是用斜杠输出,即使它是在URL中找到的'='或'& ?非常感谢你的帮助 – 2013-03-03 23:34:46

1

多了一个解决方案:

#!/usr/bin/perl 

use strict; 
use warnings; 

my @insert_words = qw/HELLO GOODBYE/; 

while (<DATA>) { 
    chomp; 
    /(?<![\/])(?:[\/](?![\/])[^\/]*)/p; 
    my $begin_part = ${^PREMATCH}; 
    my $tail = ${^MATCH} . ${^POSTMATCH}; 
    my @tail_chunks = split /\//, $tail; 

    foreach my $word (@insert_words) {      
     for my $index (1..$#tail_chunks) { 
      my @new_tail = @tail_chunks; 

      $new_tail[$index] = $word . $tail_chunks[$index]; 
      my $str = $begin_part . join "/", @new_tail; 
      print $str, "\n"; 

      $new_tail[$index] = $tail_chunks[$index] . $word; 
      $str = $begin_part . join "/", @new_tail; 
      print $str, "\n"; 
     } 

     print "\n"; 
    } 
} 

__DATA__ 
http://www.stackoverflow.com/dog/cat/rabbit/ 
http://www.superuser.co.uk/dog/cat/rabbit/hamster/ 
10.15.16.17/dog/cat/rabbit/