2017-09-13 35 views
1

我试过从字符串中删除重复项,"a","b","b","a","c"取消结果后是"a","b","c",。我已经实现了这一点,但我有一个疑问正则表达式替换的工作正则表达式替换如何在perl中工作?

use warnings; 
use strict; 
my $s = q+"a","b","b","a","c"+; 

$s=~s/ ("\w"),?/($s=~s|($1)||g)?"$1,":"" /xge; 
#^     ^
#|     Consider this as s2 
#Consider this as s1 

print "\n$s\n\n"; 

s1值包含字符串"a","b","b","a","c"

步骤1

置换后:

猜测,是什么数据包含s1以下变量"a","b","b","c""a","b","b","a","c","b","b",,"c"数据。

我已经运行与EVAL正则表达式分组

$s=~s/ ("\w"),? (?{print "$s\n"})/ ($s=~s|($1)||g)?"$1,":"" /xge; 

结果是

"a","b","b","a","c" 
,"b","b",,"c" #This is from after substitution 
,,,,"c" 
,,,,"c" 
,,,,"c" 

现在我dobut是s2变量也$s为什么它不与s1连接在一起,这意味着在第二步骤结果应该是"a","b","b","c"(所有字符串"a"被替换为空并且在$s中添加了a)。


编辑

从EVAL分组结果是(?{print $s})

"a","b","b","a","c" 
,"b","b",,"c" 
,,,,"c" 
,,,,"c" 
,,,,"c" 

我印刷$s变量这是给"a","b","c"取代行之后,如何该输出来了。?

+0

这是相当困难的检查正则表达式,并推断它应该做的,也许将其分成多行,并为每个区块添加评论,也许其他人会努力投入一些时间来解决您的问题。 –

+0

从我的角度来看,你的问题来自你的正则表达式中的空格。如果你尝试'$ s =〜s /(“\ w”),? /_/G; 打印“\ nString用_:$ s替换\ $ 1后';',你会注意到你的字符串是不变的,但是如果你删除空格,你将会有'$ s =〜s /(”\ w“) ,?/ _/g;'然后''1'将被'_'取代。 –

+0

我认为在're'模块中有[一些选项](https://perldoc.perl.org/re.html#%27Debug%27-mode)用于调试。 –

回答

6

正则表达式(在我看来)是在这里使用的错误工具。我会

  • split上逗号串
  • split
  • join列表中返回的列表中删除重复的回字符串

像这样:

#!/usr/bin/perl 

use strict; 
use warnings; 
use feature 'say'; 

my $str = q["a","b","b","a","c"]; 

my %seen; 

$str = join ',', 
     grep { ! $seen{$_}++ } 
     split /,/, $str; 

say $str; 
+0

感谢您的回答。在使用正则表达式之前,我尝试过使用哈希函数,但结果是数据被洗牌,所以我已经转向了正则表达式。现在你的答案解决了我的哈希混洗问题。但我很想知道正则表达式替换是如何工作的。 – mkHun

2

正确的解决方案是分割,过滤,重新加入为@Dav e Cross已经展示。

...

但是,下面的正则表达式的解决方案做工作,希望说明了为什么戴维的解决方案优于

#!/usr/bin/env perl 

use v5.10; 
use strict; 
use warnings; 

my $str = q{"a","b","b","a","c"}; 

1 while $str =~ s{ 
    \A 
    (?: (?&element) ,)* 
    ((?&element))   # Capture in \1 
    (?: , (?&element))* 
    \K 
    , 
    \1      # Remove the duplicate along with preceding comma 
    (?= \z | ,) 

    (?(DEFINE) 
     (?<element> 
      " 
      \w 
      " 
     ) 
    ) 
}{}xg; 

say $str; 

输出:

"a","b","c"