我目前正在研究将某些单词改为莎士比亚单词的代码。我必须提取包含单词的句子并将它们打印到另一个文件中。我必须从每个文件的开头删除.START。如何使用计数器查找单词的位置?
首先我用空格分隔文本文件,所以现在我有文字。接下来,我通过散列迭代单词。哈希键和值来自制表符分隔的文件,其结构如下OldEng/ModernEng(lc_Shakespeare_lexicon.txt
)。现在,我试图找出如何找到每个现代英语单词的确切位置,将其改为莎士比亚;然后用改变词找到句子并将它们打印到不同的文件中。除了最后一部分外,大部分代码都已完成。这是我到目前为止的代码:
#!/usr/bin/perl -w
use diagnostics;
use strict;
#Declare variables
my $counter=();
my %hash=();
my $conv1=();
my $conv2=();
my $ssph=();
my @text=();
my $key=();
my $value=();
my $conversion=();
my @rmv=();
my $splits=();
my $words=();
my @word=();
my $vals=();
my $existingdir='/home/nelly/Desktop';
my @file='Sentences.txt';
my $eng_words=();
my $results=();
my $storage=();
#Open file to tab delimited words
open (FILE,"<", "lc_shakespeare_lexicon.txt") or die "could not open lc_shakespeare_lexicon.txt\n";
#split words by tabs
while (<FILE>){
chomp($_);
($value, $key)= (split(/\t/), $_);
$hash{$value}=$key;
}
#open directory to Shakespearean files
my $dir="/home/nelly/Desktop/input";
opendir(DIR,$dir) or die "can't opendir Shakespeare_input.tar.gz";
#Use grep to get WSJ file and store into an array
my @array= grep {/WSJ/} readdir(DIR);
#store file in a scalar
foreach my $file(@array){
#open files inside of input
open (DATA,"<", "/home/nelly/Desktop/input/$file") or die "could not open $file\n";
#loop through each file
while (<DATA>){
@text=$_;
chomp(@text);
#Remove .START
@rmv=grep(!/.START/, @text);
foreach $splits(@rmv){
#split data into separate words
@word=(split(/ /, $splits));
#Loop through each word and replace with Shakespearean word that exists
$counter=0;
foreach $words(@word){
if (exists $hash{$words}){
$eng_words= $hash{$words};
$results=$counter;
print "$counter\n";
$counter++;
#create a new directory and store senteces with Shakespearean words in new file called "Sentences.txt"
mkdir $existingdir unless -d $existingdir;
open my $FILE, ">>", "$existingdir/@file", or die "Can't open $existingdir/conversion.txt'\n";
#print $FILE "@words\n";
close ($FILE);
}
}
}
}
}
close (FILE);
close (DIR);
你可以发布一些输入数据吗? – fugu
通过在变量被需要之前声明变量,你会失去'my'的一些好处。此外,所有这些作业(除了'my $ existingdir ='/ home/nelly/Desktop'; my @ file ='Sentences.txt';')都没用。 – ikegami
最有可能你会使用'索引''pos'等 - 就像这个[类似的SO问题(看看答案)](http://stackoverflow.com/a/4856558/2019415) 。我不知道你是否正确设置了查找'%hash'。尝试使用['Data :: Dumper'](https://metacpan.org/pod/Data::Dumper)或['Data :: Printer'](https://metacpan.org/release/Data-Printer)看看它是如何填写的。 –