在哈希中匹配值

我有两个散列数组。我想根据第一个变量缩小第二个范围。在哈希中匹配值

第一阵列包含具有键seqname，source，feature，start，end，score，strand，frame，geneID和transcriptID散列。

第二阵列包含具有键 organism，geneID，number，motifnumber，position，strand和sequence散列。

我想要做的，是通过哈希，所有这些具有可变geneID这是不任何第二阵列的散列发现的散列的第一个数组中删除。 - 注意两种散列都有geneID密钥。简而言之，我想将这些散列保留在第一个数组中，它们的geneID值在第二个数组的散列中找到。

我在此尝试到目前为止是有两个循环：

my @subset # define a new array for the wanted hashes to go into. 

for my $i (0 .. $#first_hash_array){ # Begin loop to go through the hashes of the first array. 

    for my $j (0 .. $#second_hash_array){ # Begin loop through the hashes of the 2nd array. 

     if ($second_hash_array[$j]{geneID} =~ m/$first_hash_array[$i]{geneID}/) 
     { 
      push @subset, $second_hash_array[$j]; 
     } 

    } 

}

但是我不知道这是去了解这个正确的方式。

来源

2013-04-15 Ward9250

对于初学者，$a =~ /$b/不检查是否相等。你需要

$second_hash_array[$j]{geneID} =~ m/^\Q$first_hash_array[$i]{geneID}\E\z/

或者干脆

$second_hash_array[$j]{geneID} eq $first_hash_array[$i]{geneID}

了点。

其次，

for my $i (0 .. $#first_hash_array) { 
    ... $first_hash_array[$i] ... 
}

可以写得更简洁的

for my $first (@first_hash_array) { 
    ... $first ... 
}

下就行了是

for my $second (@second_hash_array) { 
    if (...) { 
     push @subset, $second; 
    } 
}

可以多次添加$second至@subset。你要么需要添加last

# Perform the push if the condition is true for any element. 
for my $second (@second_hash_array) { 
    if (...) { 
     push @subset, $second; 
     last; 
    } 
}

或移动push圈外

# Perform the push if the condition is true for all elements. 
my $flag = 1; 
for my $second (@second_hash_array) { 
    if (!...) { 
     $flag = 0; 
     last; 
    } 
} 

if ($flag) { 
    push @subset, $second; 
}

取决于你想要做什么的。

要从阵列中删除，可以使用splice。但是从数组中移除会混淆所有索引，所以最好将数组向后迭代（从最后一个索引到第一个索引）。

它不仅复杂，而且价格昂贵。每次拼接时，阵列中的所有后续元素都需要移动。

更好的方法是过滤元素并将结果元素分配给数组。

my @new_first_hash_array; 
for my $first (@first_hash_array) { 
    my $found = 0; 
    for my $second (@second_hash_array) { 
     if ($first->{geneID} eq $second->{geneID}) { 
     $found = 1; 
     last; 
     } 
    } 

    if ($found) { 
     push @new_first_hash_array, $first; 
    } 
} 

@first_hash_array = @new_first_hash_array;

通过迭代反复@second_hash_array是不必要昂贵。

my %geneIDs_to_keep; 
for (@second_hash_array) { 
    ++$geneIDs_to_keep{ $_->{geneID} }; 
} 

my @new_first_hash_array; 
for (@first_hash_array) { 
    if ($geneIDs_to_keep{ $_->{geneID} }) { 
     push @new_first_hash_array, $_; 
    } 
} 

@first_hash_array = @new_first_hash_array;

最后，我们可以替换for有grep给下面的简单而有效的答案：

my %geneIDs_to_keep; 
++$geneIDs_to_keep{ $_->{geneID} } for @second_hash_array; 

@first_hash_array = grep $geneIDs_to_keep{ $_->{geneID} }, @first_hash_array;

来源

2013-04-15 18:00:39 ikegami

感谢回答，我不能肯定，但我认为其实这个删除我想要什么，并保持我想要去什么勒特。如果我想只保留first_hash_array中的哈希值，并使用与其他second_hash_array匹配的geneID，则不应该如此：'my％geneIDs_to_keep; ++ $ geneIDs_to_keep {$ _-> {geneID}} for @second_hash_array;'为了得到我想保留的ID，然后像'my @ new_array = grep $ geneIDs_to_keep {$ _-> {geneID}}，@ first_hash_array;'？ – Ward9250

另外，如果你有时间，你能扩展最后的代码块吗？对于我的新手理解，我可以看到前两行创建了一个散列，其中所有的geneID都将被删除/保留，方法是遍历数组中的每个散列并从每个散列获取geneID，使用循环和默认变量。对我来说最后一行更难以理解。我在这里查看grep页面：http://perldoc.perl.org/functions/grep.html。给定一个简单的例子'@foo = grep {！/ ^＃/} @bar;'这是'！geneIDs_to_delete {$ _-> {geneID}}'我很难解释。 – Ward9250

再读一遍，最后一行迭代遍历，'@ first_hash_array'设置'$ _'，所以例如'$ _-> {geneID}'部分变成'ID_002'，如果那是'geneID' 'first_hash_array'的元素，然后剩下的变成'grep！$ geneIDs_to_delete {ID_002}'，用于测试ID_002是否在要删除的基因列表中？ – Ward9250

这是我会怎么做。

为需要的geneID创建一个数组req_geneID并将第二个散列的所有geneId放入其中。

遍历第一散列并检查geneId包含在req_geneID阵列。 （其红宝石容易使用 “包括哪些内容？”，但你可以尝试this在Perl）

，并

最后删除亘古不变的匹配任何geneID哈希在Perl中使用this req_geneID

for (keys %hash) 
{ 
    delete $hash{$_}; 
}

希望这有助于.. :)

来源

2013-04-15 18:00:59 BabbarTushar

在哈希中匹配值

回答

相关问题