2016-02-03 216 views
1

我试着去维基百科打印出来的信息和它的作用:http://i.imgur.com/1vMj3df.jpg打印信息

这是我使用的代码:

use strict; 
use warnings; 

use WWW::Wikipedia; 
use HTML::Restrict; 

my $wiki = WWW::Wikipedia->new(clean_html => 1); 
my $hr = HTML::Restrict->new; 
my $entry = $wiki->search('Perl'); 
my $processed = $hr->process($entry->text_basic); 
print $processed; 

1; 

IM希望去除上面的东西“Perl是一个家庭“,只打印这些段落。

例如,它打印了这一点:

 {{Infobox programming language 
| name     = Perl 
| logo     = 
| paradigm    = multi-paradigm: functional, imperative, object-oriented (class-based), reflective, procedural, event-driven, generic 
| year     = 
| designer    = Larry Wall 
| developer    = Larry Wall 
| latest_release_version = 5.22.1 
| latest_release_date = 
| latest_preview_version = 5.23.7 
| latest_preview_date = 
| turing-complete  = Yes 
| typing     = Dynamic 
| influenced_by   = AWK, Smalltalk 80, Lisp, C, C++, sed, Unix shell, Pascal 
| influenced    = Chapel, Coffeescript, ECMAScript/JavaScript, Falcon, Julia, LPC, Perl 6, PHP, Python, Qore, Ruby, Windows PowerShell 
| programming_language = C 
| operating_system  = Cross-platform 
| license    = GNU General Public License or Artistic License 
| website    = 
| file_ext    = .pl .pm .t .pod 
| wikibooks    = Perl Programming 
}} 

'Perl' is a family of high-level, general-purpose, interpreted, dynamic programming languages. The languages in this family include Perl 5 and Perl 6. 

Though Perl is not officially an acronym, there are various backronyms in use, the most well-known being "Practical Extraction and Reporting Language". Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions. Perl 6, which began as a redesign of Perl 5 in 2000, eventually evolved into a separate language. Both languages continue to be developed independently by different development teams and liberally borrow ideas from one another. 

The Perl languages borrow features from other programming languages including C, shell script (sh), AWK, and sed. They provide powerful text processing facilities without the arbitrary data-length limits of many contemporary Unix commandline tools, facilitating easy manipulation of text files. Perl 5 gained widespread popularity in the late 1990s as a CGI scripting language, in part due to its unsurpassed regular expression and string parsing abilities. 

In addition to CGI, Perl 5 is used for graphics programming, system administration, network programming, finance, bioinformatics, and other applications. It has been nicknamed "the Swiss Army chainsaw of scripting languages" because of its flexibility and power, and possibly also because of its "ugliness". In 1998, it was also referred to as the "duct tape that holds the Internet together", in reference to both its ubiquitous use as a glue language and its perceived inelegance. 

我想删除:

{{Infobox programming language 
| name     = Perl 
| logo     = 
| paradigm    = multi-paradigm: functional, imperative, object-oriented (class-based), reflective, procedural, event-driven, generic 
| year     = 
| designer    = Larry Wall 
| developer    = Larry Wall 
| latest_release_version = 5.22.1 
| latest_release_date = 
| latest_preview_version = 5.23.7 
| latest_preview_date = 
| turing-complete  = Yes 
| typing     = Dynamic 
| influenced_by   = AWK, Smalltalk 80, Lisp, C, C++, sed, Unix shell, Pascal 
| influenced    = Chapel, Coffeescript, ECMAScript/JavaScript, Falcon, Julia, LPC, Perl 6, PHP, Python, Qore, Ruby, Windows PowerShell 
| programming_language = C 
| operating_system  = Cross-platform 
| license    = GNU General Public License or Artistic License 
| website    = 
| file_ext    = .pl .pm .t .pod 
| wikibooks    = Perl Programming 
}} 

我只希望每个搜索结果显示的段落。

+0

为什么不说你从wiki返回的字符串样本,并指出你想要删除的部分。然后我们可以给你一个正则表达式来为你做这项工作。 –

+0

完成了。对不起,我在手机上花了一段时间,这实际上是给朋友的。 – user2524169

+0

这更好,谢谢。请在下面找到我的答案。我希望它有帮助。 –

回答

2

匹配由维基对正则表达式返回的字符串来替换它的第一部分{{}}和空的新的生产线后马上,由空字符串:

$processed =~ s/\{\{.*\}\}\R\R//s;

正如你可以看到我使用了/s选项来匹配多行,并且我没有使用g选项,因此我们只替换第一个贪婪匹配。

+0

完美地工作,就在我尝试其他搜索词时,它显示的内容高于第一段的起始位置 – user2524169

+0

您的意思是您想要移除的{{}}之外的额外文本? –

+0

当我尝试例如,“水泥”一词。这是它打印出来的:护目镜和防护手套,与此处所示的工作人员不同,http://www.hse.gov.uk/pubns/cis26.pdfhttp://www.cementaustralia.com.au/wps/wcm/connect /website/packaged-products/home/hints-and-tips/FAQ-Safety/#FAQ-safety-PPE.htmlhttp://www.tdi.texas.gov/pubs/videoresource/stpcement.pdf]] A'水泥“是一种粘合剂,是一种固化并硬化并可将其他材料粘合在一起的物质。 “水泥”这个词可以追溯到罗马时期[opus caementicium]],用来描述类似于现代混凝土的砌体,它是 – user2524169