Perl按用户定义的字母顺序对文字排序

我有一个“单词”（字符串）数组，其中包含用户自定义顺序的“字母”的字母。例如，我的“字母”以"ʔ ʕ b g d"开头，因此在sort by_my_alphabet之后的“字词”(bʔd ʔbg ʕʔb bʕd)的列表应该是ʔbd ʕʔb bʔd bʕd。Perl按用户定义的字母顺序对文字排序

sort by_my_alphabet (bʔd ʔbg ʕʔb bʕd) # gives ʔbd ʕʔb bʔd bʕd

有没有一种方法，使一个简单的子程序by_my_alphabet与$a和$b来解决这个问题？

来源

2017-09-27 Eugene Barsky

你上面的代码是正确的，你只需要实现by_my_alphabet。查看这里使用自定义比较器演示的示例：https：//perldoc.perl.org/functions/sort.html – Floegipoky

是的。

sort可以采用任何返回相对排序位置的函数。所有你需要的是一个函数，可以正确查找字符串的“排序值”进行比较。

所以你所需要做的就是定义你的多余字母的“相对权重”，然后比较两者。

#!/usr/bin/env perl 

use strict; 
use warnings; 

use Data::Dumper; 

my @sort_order = qw (B C A D); 

my @array_to_sort = qw (A B C D A B C D AB BB CCC ABC); 

my $count = 0; 
my %position_of; 
$position_of{$_} = $count++ for @sort_order; 

print Dumper \%position_of; 

sub sort_by_pos { 

    my @a = split //, $a; 
    my @b = split //, $b; 

    #iterate one letter at a time, using 'shift' to take it off the front 
    #of the array. 
    while (@a and @b) { 
    my $result = $position_of{shift @a} <=> $position_of{shift @b}; 
    #result is 'true' if it's "-1" or "1" which indicates relative position. 
    # 0 is false, and that'll cause the next loop iteration to test the next 
    #letter-pair 
    return $result if $result; 
    } 
    #return a value based on remaining length - longest 'string' will sort last; 
    #That's so "AAA" comparing with "AA" comparison actually work, 
    return scalar @a <=> scalar @b; 
} 


my @new = sort { sort_by_pos } @array_to_sort; 

print Dumper \@new;

一个简单的案例一点，但排序数组到：

$VAR1 = [ 
      'B', 
      'B', 
      'BB', 
      'C', 
      'C', 
      'CCC', 
      'A', 
      'A', 
      'AB', 
      'ABC', 
      'D', 
      'D' 
     ];

来源

2017-09-27 16:09:39 Sobrique

简单，而且速度非常快，因为它不使用一个比较回调，但它需要扫描整个字符串：

use utf8; 

my @my_chr = split //, "ʔʕbgd"; 
my %my_ord = map { $my_chr[$_] => $_ } 0..$#my_chr; 

my @sorted = 
    map { join '', @my_chr[ unpack 'W*', $_ ] } # "\x00\x01\x02\x03\x04" ⇒ "ʔʕbgd" 
    sort 
    map { pack 'W*', @my_ord{ split //, $_ } } # "ʔʕbgd" ⇒ "\x00\x01\x02\x03\x04" 
    @unsorted;

优化的长字符串，因为它只扫描字符串，直到发现差别时：

use utf8; 

use List::Util qw(min); 

my @my_chr = split //, "ʔʕbgd"; 
my %my_ord = map { $my_chr[$_] => $_ } 0..$#my_chr; 

sub my_cmp($$) { 
    for (0 .. (min map length($_), @_) - 1) { 
     my $cmp = $my_ord{substr($_[0], $_, 1)} <=> $my_ord{substr($_[1], $_, 1)}; 
     return $cmp if $cmp; 
    } 

    return length($_[0]) <=> length($_[1]); 
} 

my @sorted = sort my_cmp @unsorted;

两者都应该比Sobrique's更快。他们使用比较回调，并扫描所比较的整个字符串。

来源

2017-09-27 16:12:23 ikegami

太棒了！尽管我只是Perl的初学者，但我不明白'map'，'$$'和'min map'是如何工作的。希望尽快学习它，并充分理解代码。 –

'$$'是'sort func LIST'调用的函数的预期原型。它有效地意味着sub有两个参数。（一般来说，避免使用子原型！） – ikegami

'min'是返回作为参数传递的最小数字的子。 – ikegami

Perl按用户定义的字母顺序对文字排序

回答

相关问题