2013-08-07 59 views
0

我在寻找awk的代码来连接从PDF粘贴的线。加入过程应按照以下规则进行:如果一行中的最后一个字符不是句点.,则应在该行中添加空格字符,并且应将下一行加入到该行中。有条件地在awk中连接线

采样输入文本(在文件中):

In a perfect school, students would treat each other with affection and 
respect. Differences would be tolerated, and even welcomed. Kids would 
become more popular by being kind and supportive. Students would go out 
of their way to make sure one another felt happy and comfortable. But most 
schools are not perfect. Instead of being places of respect and tolerance, 
they are places where the hateful act of bullying is widespread. 

Students have to deal with all kinds of problems in schools. There are 
the problems created by difficult classes, by too much homework, or by 
personality conflicts with teachers. There are problems with scheduling 
the classes you need and still getting some of the ones you want. There 
are problems with bad cafeteria food, grouchy principals, or overcrowded 
classrooms. But one of the most difficult problems of all has to do with a 
terrible situation that exists in most schools: bullying. 

预期输出:

在一个完美的学校,学生们会像对待彼此的感情 和尊重。差异是可以容忍的,甚至是受欢迎的。善良和支持,孩子们会变得更受欢迎。学生们会尽力确保彼此感到高兴,并且 舒服。但大多数学校并不完美。他们不是被尊重和宽容的地方,而是欺凌行为可恶的行为广泛存在的地方。

学生不得不面对各种学校的问题。还有 由困难的班级,太多的功课,或与教师的个性冲突造成的问题。 在安排您需要的课程方面存在问题,仍然会获得一些您想要的课程。自助餐厅的食物不好,脾气暴躁的校长, 或过度拥挤的教室都有问题。但是, 最困难的问题之一都与大多数学校存在的可怕情况有关: 欺凌。

(预期输出具有在一行上的每个段落推测:。段彼此由空行分隔)

+2

这是Markdown格式化的一个不幸的副作用,即输入和输出之间几乎没有区别。据推测,产出应该有'粗笨的校长'而不是'粗鲁的校长'。 –

回答

0

这可能是足够的:

awk -v ORS= '!NF{$NF="\n"} NF{ $NF = $NF ($NF~/\.$/?"\n":" ")} 1' input 
+0

我把下面的代码放在一个文件test.awk中:ORS ='$ NF〜/\.$/{$NF=$NF"\n“} 1'。要修改的文本位于文件“pdfpaste.txt”中。然后我打电话给:gawk -f test1.awk pdfpaste.txt> pdfpaste2.txt。但是pdfpaste.txt文件中没有输出。难道我做错了什么? – user1955215

+0

用以下代码替换test.awk:ORS ='!NF {$ NF =“\ n”} NF {$ NF〜/\.$/? $ NF = $ NF“\ n”:$ NF = $ NF“”} 1'(仍然不工作) – user1955215

+0

像这样运行:'awk -v ORS = -f test.awk input' – perreal

0

如果你的输入文件段落真的被空行分开,那么你所需要的只是:

awk -v RS= -v ORS='\n\n' '{$1=$1}1' file