2010-11-29 267 views
0

我从亚马逊乔治·迈克尔的DVD轨道目录的$str剪切和粘贴和随后的代码通过分裂来处理它的前两位,其余:分割线分为两个部分

$str = "20 Fastlove 21 Jesus To A Child 22 Spinning the Wheel 23 Older 24 Outside 25 As (with Mary J. Blige) 26 Freeek! 27 Amazing 28 John and Elvis are Dead 29 Flawless (Go To The City) 30 Shoot The Dog 31 Roxanne 32 An Easier Affair 33 If I Told You That (with Whitney Houston) 34 Waltz Away Dreaming 35 Somebody To Love 36 I Can’t Make You Love Me 37 Star People '97 38 You Have Been Loved 39 Killer/ Papa Was A RollIn Stone 40 Round Here"; 

while ($str =~ /(\d{2}) (\S+)/g) { 
     print "$1 $2\n"; 
} 

结果:

20 Fastlove 
21 Jesus 
22 Spinning 
23 Older 
24 Outside 
25 As 
26 Freeek! 
27 Amazing 
28 John 
29 Flawless 
30 Shoot 
31 Roxanne 
32 An 
33 If 
34 Waltz 
35 Somebody 
36 I 
37 Star 
97 38 
39 Killer/ 
40 Round 

上述类型的作品,但不包括完整曲目名称。有关如何获得我想要的结果的任何建议?我期待,或希望的结果是:

20 Fastlove 
21 Jesus To A Child 
22 Spinning the Wheel 
[etc.] 
+2

无法理解这个问题。 – tchrist 2010-11-29 00:22:28

+4

不能可靠地完成,因为歌曲名称中没有数字。 – 2010-11-29 00:23:52

+0

为什么不使用像CDDB这样的轨道信息数据库而不是向亚马逊寻求信息? – Ether 2010-11-29 18:56:16

回答

1

你这么混账的接近:

$str = "20 Fastlove 21 Jesus To A Child 22 Spinning the Wheel 23 Older 24 Outside 25 As (with Mary J. Blige) 26 Freeek! 27 Amazing 28 John and Elvis are Dead 29 Flawless (Go To The City) 30 Shoot The Dog 31 Roxanne 32 An Easier Affair 33 If I Told You That (with Whitney Houston) 34 Waltz Away Dreaming 35 Somebody To Love 36 I Can’t Make You Love Me 37 Star People '97 38 You Have Been Loved 39 Killer/ Papa Was A RollIn Stone 40 Round Here"; 

while ($str =~ /(\d{2}[^\d]*)/g) { 
    print "$1\n"; 
} 

注意正则表达式,我现在用的是[^ ]语法并不意味着性格。 [^ \ d]表示不是数字,末尾的星号表示零个或多个。

通过指定我希望字符串的其余部分可以继续,直到找到一个数字,我可以选择其余的名称(即,直到Star People'97。Darn it。So close ...

如果您需要两个单独变量的编号和标题,可以使用括号。

$str = "20 Fastlove 21 Jesus To A Child 22 Spinning the Wheel 23 Older 24 Outside 25 As (with Mary J. Blige) 26 Freeek! 27 Amazing 28 John and Elvis are Dead 29 Flawless (Go To The City) 30 Shoot The Dog 31 Roxanne 32 An Easier Affair 33 If I Told You That (with Whitney Houston) 34 Waltz Away Dreaming 35 Somebody To Love 36 I Can’t Make You Love Me 37 Star People '97 38 You Have Been Loved 39 Killer/ Papa Was A RollIn Stone 40 Round Here"; 

while ($str =~ /(\d{2})([^\d]*)/g) { 
    my $number = $1; 
    my $title = $2; 

    print "$number: $title\n"; 
} 

仍在试图弄清楚如何获得星人97工作。我相信这与开头的单引号有关。所有的数字前面都有一个空格,或者在一行的开头。我想知道这是否可以使用?

1

正如伊格纳西奥巴斯克斯 - 艾布拉姆斯说,数字歌曲名称将是一个问题,但这应该对所有工作,除了“星人'97”

/(\d{2}) (\D+)/g 

注:(除了提到的“97”的情况下),我不是一个Perl的编码器,但正则表达式中rubular.com正常工作

6

正如伊格纳西奥说,这真的不能用做因为曲目名称可以包含数字,因此100%的准确性。但是,因为你也许可以假设音轨的编号将是连续的,你能来非常接近100%:

my $str = "20 Fastlove 21 Jesus To A Child 22 Spinning the Wheel 23 Older 24 Outside 25 As (with Mary J. Blige) 26 Freeek! 27 Amazing 28 John and Elvis are Dead 29 Flawless (Go To The City) 30 Shoot The Dog 31 Roxanne 32 An Easier Affair 33 If I Told You That (with Whitney Houston) 34 Waltz Away Dreaming 35 Somebody To Love 36 I Cant Make You Love Me 37 Star People '97 38 You Have Been Loved 39 Killer/ Papa Was A RollIn Stone 40 Round Here"; 

my ($track) = ($str =~ /^(\d+)/) or die "No initial track number"; 

my $next; 
while ($next = $track + 1 and 
     $str =~ s/^\s*    # optional initial whitespace 
       $track \s+  # track number followed by whitespace 
       (\S.*?)   # title begins with non-whitespace 
       (?= \s+ $next \s # title stops at next track # 
        | $)  # or end-of-string 
       //x) { 
    print "$track $1\n"; 
    $track = $next; 
} 

die "$str left over" if $str =~ /\S/; # sanity check 

此修改$str,所以如果有必要进行复制。

如果曲目的标题包含下一个曲目编号,则这将失败,但这应该是不常见的。如果缺少曲目或曲目编号不连续,它也会失败。

2

军事审判的答案的一个变种,无损扫描输入字符串:

if ($str =~ /^(\d+)/) { 
    my ($current, $next) = ($1, $1 + 1); 
    while ($str =~ /\G *$current ((?:(?! *$next).)+)/g) { 
     print "$current $1\n"; 
     ($current, $next) = ($next, $next + 1); 
    } 
} 
1

最好的办法是像下面这样。但是如果其中一个轨道包含下一个轨道的号码,即使它也有问题。

#!/usr/bin/perl 

use strict; 
use warnings; 

my $str = "20 Fastlove 21 Jesus To A Child 22 Spinning the Wheel 23 Older 24 Outside 25 As (with Mary J. Blige) 26 Freeek! 27 Amazing 28 John and Elvis are Dead 29 Flawless (Go To The City) 30 Shoot The Dog 31 Roxanne 32 An Easier Affair 33 If I Told You That (with Whitney Houston) 34 Waltz Away Dreaming 35 Somebody To Love 36 I Can’t Make You Love Me 37 Star People '97 38 You Have Been Loved 39 Killer/ Papa Was A RollIn Stone 40 Round Here"; 

my @parts = split " ", $str; 

my %songs; 
my $track  = shift @parts; 
my $new_track = $track + 1; 
my $song  = ""; 
while (@parts) { 
    my $part = shift @parts; 
    unless ($part eq $new_track) { 
     $song .= " $part"; 
     next; 
    } 
    $songs{$track} = $song; 
    $song   = ""; 
    $track   = $new_track; 
    $new_track  = $track + 1; 
} 

for my $track (sort { $a <=> $b } keys %songs) { 
    print "$track\t$songs{$track}\n"; 
} 
2

这里的另一种方法(also on ideone.com)

while ($str =~ /(?<!\S)(\d+)\s+((?!\d+\s)\S+(?:\s+(?!\d+\s)\S+)*)/g) { 
    print "$1 $2\n"; 
} 

这是假设的是,再接一个空格,而不是由非空格前面一个或多个数字的任何序列的轨道数,消除在'97。 #37的标题,但没有什么能阻止一个歌曲标题从一个裸号码

在一般情况下,我认为@ cjm的连续数字的想法可能是你最好的选择

2

我在这里提出了其中一个答案,因为我认为它很好地回答了您的具体问题,除了“此曲目名称包含下一曲目的曲目号码”问题之外。与此属性相册将是少之又少。

但是我得说一句,你的问题真的源于这种格式的$str。例如,如果您查看this page的源文件,则可以非常轻松地从HTML本身提取曲目名称,而不考虑曲目的名称。

这是因为HTML清晰地描绘了曲目。现在我不知道这些信息是否可用,但您可能需要重新考虑如何获取这些数据。它可能会让你的生活变得更容易。或者,如果不那么容易,至少更准确:-)