如何:
#!/usr/bin/perl
use strict;
use warnings;
use 5.014;
my $re = qr
/^ # Start of string
[\p{Lu}\pN, -]+ # one or more uppercase letter or number or comma or space or dash
( # start group 1
\p{Lu}[\pL.'] # one uppercase letter followed by any letter or dot or apostroph
) # end group
/x;
while(<DATA>) {
chomp;
s/$re/$1/g; # replace match by group 1
say;
}
__DATA__
AMINO-2,4,6-TRIIODOBENZOIC ACIDS Hugo Holtermann, Baerum, Leif Gunnar Haugen, Oslo, and Knut Wille, Baerum, Norway, assignors to Nye- 5
PROCESS FOR THE PRODUCTION OF ETHYLENIC COMPOUNDS Duncan Clark and Percy Hayden, Norton-on-Tees, Eng- 5 land, assignors to ImperiaI Chemical Industries Limited, London, England
PROCESS FOR THE PRODUCTION OF ETHYLENIC COMPOUNDS D.Clark
PROCESS FOR THE PRODUCTION OF ETHYLENIC COMPOUNDS O'Connors
输出:
Hugo Holtermann, Baerum, Leif Gunnar Haugen, Oslo, and Knut Wille, Baerum, Norway, assignors to Nye- 5
Duncan Clark and Percy Hayden, Norton-on-Tees, Eng- 5 land, assignors to ImperiaI Chemical Industries Limited, London, England
D.Clark
O'Connors
这是*通常为*大写,或总是大写?通常不会对你有所帮助。我觉得这可能是不可能的,这取决于名字格式化的程度。如果你有某人的名字,那只是一个最初的名字,即'J. Doe'我想不出任何合乎逻辑的方式将它与标题区分开来。 – Tim
它总是大写。我不是在寻找100%准确的东西。一些70-80%的东西就可以了 –