要分配的标题级别。
第一个标题是分配的级别1.我提取字体系列和它的大小,寻找匹配的标题。一旦级别被分配,我取消标题的标题,在另一个注释(HeadingHierarchy)中保留标题&。等级完成后,只要Headinglevel注释中剩下任何标题,就会一次又一次调用同一个块。
问题:
该脚本工作正常,发现所有1级标题。但是,当通过Call语句执行该块时,它仅找到每个级别的第一个匹配(级别2以上)。因此电平为低于输入的总数变为10,而它必须是4。
输入:(.txt)的
Apache UIMA Ruta Overview =>Arial,18
What is Apache UIMA Ruta? =>Arial,16
Getting started =>Arial,16
UIMA Analysis Engines =>Arial,16
Ruta Engine =>Times New Roman,14
Configuration Parameters =>Arial,10
Annotation Writer =>Times New Roman,14
Configuration Parameters =>Arial,10
Apache UIMA Ruta Language =>Arial,18
Syntax =>Arial,16
Rule elements and their matching order =>Arial,16
脚本:
PACKAGE uima.ruta.example;
DECLARE Headinglevel(STRING family, INT size, INT level);
DECLARE HeadingHierarchy(STRING family, INT size, INT level);
DECLARE FontFamily, FontSize;
STRING family;
INT size;
RETAINTYPE(BREAK);
BREAK? #{-PARTOF(Headinglevel)} @SPECIAL+ W+ COMMA NUM{->MARK(Headinglevel,2,6), MARK(HeadingHierarchy,2,6), MARK(FontFamily,4), MARK(FontSize,6)};
RETAINTYPE;
h:Headinglevel{->h.family = family, HeadingHierarchy.family = family}
<-{FontFamily{PARSE(family)};};
h:Headinglevel{->h.size = size, HeadingHierarchy.size = size}
<-{FontSize{PARSE(size)};};
INT i=1;
BLOCK(ForEachHeadLevel)Document{}
{
# h:Headinglevel{-> family = h.family, size = h.size};
h:Headinglevel{AND(h.family == family, h.size == size)-> h.level=i, HeadingHierarchy.level = i, UNMARK(h)};
}
Headinglevel{->i=i+1, CALL(Test2.ForEachHeadLevel)};
Document{->LOG(" LEVELS : " + (i))};
预计产量:
HeadingHierarchy Feature
Apache UIMA... =>Arial,18 level: 1
What is Apa... =>Arial,16 level: 2
Getting sta... =>Arial,16 level: 2
UIMA Analys... =>Arial,16 level: 2
Ruta Engine... =>Times New Roman,14 level: 3
Configurati... =>Arial,10 level: 4
Annotation ... =>Times New Roman,14 level: 3
Configurati... =>Arial,10 level: 4
Apache UIMA... =>Arial,18 level: 1
Syntax =>Ar... =>Arial,16 level: 2
Rule elemen... =>Arial,16 level: 2
我加org.apache.uima.ruta.block.DocumentBlockExtension在additionalExtensions。但是我得到错误,输入“DOCUMENTBLOCK”没有在这个脚本/块中定义! – prasanth
看起来像脚本运行后新添加的参数被删除(在这种情况下有错误)。 dictRemoveWS也会发生同样的情况,所以每次运行脚本时都需要添加它。 – prasanth
是的,看起来扩展在Workbench中不可用。我会修好它。 –