2011-01-06 42 views
0

我已经得到了文本文件:解析YAML般的文本文件转换成散列结构

country = { 
    tag = ENG 
    ai = { 
     flags = { } 
     combat = { ROY WLS PUR SCO EIR FRA DEL USA QUE BGL MAH MOG VIJ MYS DLH GUJ ORI JAI ASS MLC MYA ARK PEG TAU HYD } 
     continent = { "Oceania" } 
     area = { "America" "Maine" "Georgia" "Newfoundland" "Cuba" "Bengal" "Carnatic" "Ceylon" "Tanganyika" "The Mascarenes" "The Cape" "Gold" "St Helena" "Guiana" "Falklands" "Bermuda" "Oregon" } 
     region = { "North America" "Carribean" "India" } 
     war = 50 
     ferocity = no 
    } 
    date = { year = 0 month = january day = 0 } 
} 

我试图做的是分析这个文本到Perl中的哈希结构,使数据后的输出转储看起来是这样的:

$VAR1 = { 
      'country' => { 
         'ai' => { 
            'area' => [ 
               'America', 
               'Maine', 
               'Georgia', 
               'Newfoundland', 
               'Cuba', 
               'Bengal', 
               'Carnatic', 
               'Ceylon', 
               'Tanganyika', 
               'The Mascarenes', 
               'The Cape', 
               'Gold', 
               'St Helena', 
               'Guiana', 
               'Falklands', 
               'Bermuda', 
               'Oregon' 
              ], 
            'combat' => [ 
               'ROY', 
               'WLS', 
               'PUR', 
               'SCO', 
               'EIR', 
               'FRA', 
               'DEL', 
               'USA', 
               'QUE', 
               'BGL', 
               'MAH', 
               'MOG', 
               'VIJ', 
               'MYS', 
               'DLH', 
               'GUJ', 
               'ORI', 
               'JAI', 
               'ASS', 
               'MLC', 
               'MYA', 
               'ARK', 
               'PEG', 
               'TAU', 
               'HYD' 
               ], 
            'continent' => [ 
                'Oceania' 
                ], 
            'ferocity' => 'no', 
            'flags' => [], 
            'region' => [ 
               'North America', 
               'Carribean', 
               'India' 
               ], 
            'war' => 50 
           }, 
         'date' => { 
            'day' => 0, 
            'month' => 'january', 
            'year' => 0 
            }, 
         'tag' => 'ENG' 
         } 
     }; 

硬编码的版本可能是这样的:

#!/usr/bin/perl 
use Data::Dumper; 
use warnings; 
use strict; 

my $ret; 

$ret->{'country'}->{tag} = 'ENG'; 
$ret->{'country'}->{ai}->{flags} = []; 
my @qw = qw(ROY WLS PUR SCO EIR FRA DEL USA QUE BGL MAH MOG VIJ MYS DLH GUJ ORI JAI ASS MLC MYA ARK PEG TAU HYD); 
$ret->{'country'}->{ai}->{combat} = \@qw; 
$ret->{'country'}->{ai}->{continent} = ["Oceania"]; 
$ret->{'country'}->{ai}->{area} = ["America", "Maine", "Georgia", "Newfoundland", "Cuba", "Bengal", "Carnatic", "Ceylon", "Tanganyika", "The Mascarenes", "The Cape", "Gold", "St Helena", "Guiana", "Falklands", "Bermuda", "Oregon"]; 
$ret->{'country'}->{ai}->{region} = ["North America", "Carribean", "India"]; 
$ret->{'country'}->{ai}->{war} = 50; 
$ret->{'country'}->{ai}->{ferocity} = 'no'; 
$ret->{'country'}->{date}->{year} = 0; 
$ret->{'country'}->{date}->{month} = 'january'; 
$ret->{'country'}->{date}->{day} = 0; 

sub hash_sort { 
    my ($hash) = @_; 
    return [ (sort keys %$hash) ]; 
} 

$Data::Dumper::Sortkeys = \hash_sort; 

print Dumper($ret); 

我不得不承认我有一个巨大的问题,德与嵌套大括号相吻合。 我试图通过使用贪婪和非理性匹配来解决它,但它似乎并没有做到这一点。我也读过扩展模式(比如(?PARNO)),但我绝对不知道如何在我的特定问题中使用它们。数据的顺序是不相关的,因为我有hash_sort子例程。 我会apprieciate任何帮助。

+2

什么创建了文本文件。我的解决方案是找到一种方法来创建文本文件,以便它真的是一个YAML文件。否则,这太疯狂了!以标准格式创建它更容易,阅读起来也更容易! – 2011-01-06 17:02:44

+0

悖论savefiles,是吧? – Oesor 2011-01-06 18:36:22

回答

3

我打破它归结为一些简单的假设:

  1. 的条目将包括一个标识符后跟一个等号
  2. 的条目是三种基本类型之一:A级或一组或一单值
  3. 一组有3种形式:1)带引号的空格分隔列表; 2)键 - 值对,3)QW状无引号列表
  4. 一组键 - 值对必须包含一个密钥和indentifier要么nonspaces或引用 值的值

见穿插评论。

use strict; 
use warnings; 

my $simple_value_RE 
    = qr/^ \s* (\p{Alpha}\w*) \s* = \s* ([^\s{}]+ | "[^"]*") \s* $/x 
    ; 
my $set_or_level_RE 
    = qr/^ \s* (\w+) \s* = \s* [{] (?: ([^}]+) [}])? \s* $/x 
    ; 
my $quoted_set_RE 
    = qr/^ \s* (?: "[^"]+" \s+)* "[^"]+" \s* $/x 
    ; 
my $associative_RE 
    = qr/^ \s* 
     (?: \p{Alpha}\w* \s* = \s* (?: "[^"]+" | \S+) \s+)* 
     \p{Alpha}\w* \s* = \s* (?: "[^"]+" | \S+) 
     \s* $ 
    /x 
    ; 
my $pair_RE = qr/ \b (\p{Alpha}\w*) \s* = \s* ("[^"]+" | \S+)/x; 

sub get_level { 
    my $handle = shift; 
    my %level; 
    while (<$handle>) { 
     # if the first character on the line is a close, then we're done 
     # at this level 
     last if m/^\s*[}]/; 
     my ($key, $value); 

     # get simple values 
     if (($key, $value) = m/$simple_value_RE/) { 
      # done. 
     } 
     elsif (($key, my $complete_set) = m/$set_or_level_RE/) { 
      if ($complete_set) { 
       if ($complete_set =~ m/$quoted_set_RE/) { 
        # Pull all quoted values with global flag 
        $value = [ $complete_set =~ m/"([^"]+)"/g ]; 
       } 
       elsif ($complete_set =~ m/$associative_RE/) { 
        # going to create a hashref. First, with a global flag 
        # repeatedly pull all qualified pairs 
        # then split them to key and value by spliting them at 
        # the first '=' 
        $value 
         = { map { split /\s*=\s*/, $_, 2 } 
           ($complete_set =~ m/$pair_RE/g) 
         }; 
       } 
       else { 
        # qw-like 
        $value = [ split(' ', $complete_set) ]; 
       } 
      } 
      else { 
       $value = get_level($handle); 
      } 
     } 
     $level{ $key } = $value; 
    } 
    return wantarray ? %level : \%level; 
} 

my %base = get_level(\*DATA); 
2

那么,正如David所建议的那样,最简单的方法是使用标准格式生成文件。 JSON,YAML或XML将更容易解析。如果你真的需要解析这种格式,我会使用Regexp::Grammars(如果你可以要求Perl 5.10)或Parse::RecDescent(如果你不能)为它写一个语法。这会有点棘手,尤其是因为你似乎使用了两个哈希数组,但它应该是可行的。

2

内容看起来很规整。为什么不对内容执行一些替换并将其转换为哈希语法,然后对其进行评估。这将是一个快速和肮脏的方式来转换它。

假设你知道语法,你也可以编写一个解析器。