2014-09-26 33 views
1

我想通过使用数组元素来匹配数据上的缺席单词。 我的代码是如何匹配perl中缺少数组元素?

use warnings; 
use strict; 
my @ar = qw(one two three four five six seven eight nine ten); 
my @data = <DATA>; 
print "Absence word in the data\n"; 
foreach my $mat(@ar){ 
    my $nonmatch; 
    foreach my $dat (@data){ 
     $nonmatch = grep{m/(?!$mat)/} $dat; 
    } 
    print "$nonmatch\n"; 
} 
__DATA__ 
eight two four one two three four seven eight ten one two seven 

首先参阅数据阵列元件上的阵列元素的值是在仅打印在数据不存在。

我预期成果是:

Absence word in the data 
five 
six 
nine 

我该怎么办呢

+2

对于'@ data'中的单词使用散列,这样就可以检查散列中是否存在$ mat。 – 2014-09-26 17:12:16

回答

2

使用见过风格散列作为蓝本在perlfaq4 - How can I tell whether a certain element is contained in a list or array?

use warnings; 
use strict; 

my %seen = map { $_ => 1 } map { split ' ' } <DATA>; 

my @ar = qw(one two three four five six seven eight nine ten); 

print "Absence word in the data\n"; 
print "$_\n" for grep { !$seen{$_} } @ar; 

__DATA__ 
eight two four one two three four seven eight ten one two seven 

输出:

Absence word in the data 
five 
six 
nine 
+1

你需要'chomp'你的''?或者,拆分是否会在最后删除NL? – 2014-09-29 04:08:09

+0

不需要'chomp'。 'split'或'split'''是特殊的情况,被视为'split/\ s + /'的特点,除去任何前面的间距。分割后可能会有一个尾随的'''',但前提是我们使用了一个负数限制。 – Miller 2014-09-29 04:46:00

1

您可以使用哈希片@seen{@r}存储在%seen哈希从@r所有见过的词,检查后对@ar这些哈希键阵列,

use warnings; 
use strict; 

my @ar = qw(one two three four five six seven eight nine ten); 
my %seen; 
while (my $mat = <DATA>) { 
    my @r = split (' ', $mat); 
    @seen{@r} =(); 
} 
print "Absence word in the data\n"; 
print "$_\n" for grep { not exists $seen{$_} } @ar; 

__DATA__ 
eight two four one two three four seven eight ten one two seven 

输出

Absence word in the data 
five 
six 
nine 
+0

请在投票时进行评论。 – 2014-09-26 18:21:41

+0

谢谢@mpapec这是在工作请解释你的代码。我没有想到宣布hasesh? – mkHun 2014-09-26 19:02:43

1

这听起来像一个问题,我曾在一个点上,我想出了代码,是我创建的基础上,在此页面中的信息下面的代码:

https://www.safaribooksonline.com/library/view/perl-cookbook/1565922433/ch04s08.html

# assume @A and @B are already loaded 
%seen =();      # lookup table to test membership of B 
@aonly =();      # answer 

# build lookup table 
foreach $item (@B) { $seen{$item} = 1 } 

# find only elements in @A and not in @B 
foreach $item (@A) { 
    unless ($seen{$item}) { 
     # it's not in %seen, so add to @aonly 
     push(@aonly, $item); 
    } 
} 
1

创建一个散列,其中包含__DATA__中的所有单词作为关键字(可以使用散列片在一行中完成),然后过滤未散列的单词(也可以使用grep在一行中完成)。

use warnings; 
use strict; 
my @ar = qw(one two three four five six seven eight nine ten); 

my $data = join '', (<DATA>); 
my @data_words = split ' ', $data; # get a list of words 

my %data; 
@data{@data_words} = @data_words; # fill a hash with the words from __DATA__ 

my @missing = grep { !exists $data{$_}; } @ar; # filter words 

print "Absence word in the data: @missing\n"; 

__DATA__ 
eight two four one two three four seven eight ten one two seven 
0

该解决方案从您正在查找的物品列表开始,然后丢弃沿途所看到的任何物品,然后打印出剩下的物品。

如果在%unseen哈希中仍然有任何密钥,您可以通过检查while循环的底部来优化大数据。我在测试数据中添加了另一行,并添加了“16”这个词,以确保它能够处理多行,并且我们没有在那里得到“六”的误报。

use warnings; 
use strict; 

my @to_match = qw/ one two three four five six seven eight nine ten /; 
my %unseen; 
$unseen{$_} = 1 for @to_match; 
while (my $line = <DATA>) { 
    foreach my $match_this (@to_match) { 
     delete $unseen{$match_this} if $line =~/\b$match_this\b/; 
    } 
} 
print "Words absent from the data:\n". join "\n", keys %unseen; 
print "\n"; 
__DATA__ 
eight two four one two three four seven eight ten one two seven 
sixteen 
+0

请在评论时进行评论 - 这种解决方案是一种有用的可能优化方法 - 我没有进行广泛的测试,但我相信它确实有效。 – msouth 2014-09-26 21:48:19

1

两件事情:

始终chomp你读什么这包括__DATA__

my @data = <DATA>; # The NL is in each element 
chomp @data;   # Now it isn't! 

如果你不chomp,你会检查看看one匹配one\n。此外,由于您将整个__DATA__放在一行上,因此它将作为单行输入读取。您将不得不使用split将其分隔到数组中。

第二件事:通常,当你问这是在吗?t类型的问题,你应该立即想到哈希。散列可以很快用于查找项目。在这种情况下,你会做你的数据的哈希值,然后验证是否在你的列表中的每个项目在散列:

#! /usr/bin/env perl 
# 

use strict; 
use warnings; 
use feature qw(say); 

my @list = qw(one two three four five six seven eight nine ten); 
my @data = <DATA>; 
chomp @data;  # Don't forget! 

# 
# Translate your input as a hash 
# 

my %data_hash; 
for my $element (@data) { 
    $data_hash{$element} = 1; 
} 

for my $element (@list) { 
    if (not exists $data_hash{$element}) { 
     say "$element isn't in the list"; 
    } 
} 
__DATA__ 
eight 
two 
four 
one 
two 
three 
four 
seven 
eight 
ten 
one 
two 
seven 

注意,map命令给你写这个循环的一个短的方法:

# 
# Translate your input as a hash 
# 

my %data_hash; 
for my $element (@data) { 
    $data_hash{$element} = 1; 
} 

现在可以缩短为单行:

# 
# Translate your input as a hash 
# 

my %data_hash = map { $_ => 1 } @data; 

这是翻动数组的哈希值的常见方式,因此大多数开发人员会简单地使用它。