2011-06-29 91 views
1

在下面的Pig代码中可以看到,我重复了Attr1和Attr2的一组语句。有没有办法在功能中提取出来?代码示例真的有帮助。猪脚本功能问题

Attr1ValidRecs = FILTER BaseRecs BY Attr1 IS NOT NULL; 
Attr1ValidRecs_all = GROUP Attr1ValidRecs ALL; 
Attr1Count = FOREACH Attr1ValidRecs_all GENERATE COUNT(Attr1ValidRecs); 
Attr1CountStr = FOREACH Attr1Count GENERATE CONCAT('Recs with Attr1 not null : ',(chararray)$0); 

Attr1BaseCross = CROSS BaseRecsCount,Attr1Count; 
Attr1BaseRatio = FOREACH Attr1BaseCross GENERATE CONCAT('Ratio of Not Null Attr1 to Total Base Recs: ',(chararray)((double)$1/(double)$0)); 

Attr2ValidRecs = FILTER BaseRecs BY Attr2 IS NOT NULL; 
Attr2ValidRecs_all = GROUP Attr2ValidRecs ALL; 
Attr2Count = FOREACH Attr2ValidRecs_all GENERATE COUNT(Attr2ValidRecs); 
Attr2CountStr = FOREACH Attr2Count GENERATE CONCAT('Recs with Attr2 not null : ',(chararray)$0); 

Attr2BaseCross = CROSS BaseRecsCount,Attr2Count; 
Attr2BaseRatio = FOREACH Attr2BaseCross GENERATE CONCAT('Ratio of Not Null Attr2 to Total Base Recs: 
',(chararray)((double)$1/(double)$0)); 

回答

0

不幸的是,您无法将多个行替换为一批Pig操作。这是我希望我有时可以做的事情,所以我很同情。

我在过去做过的事情,我在同一个脚本中反复地重复了一遍,就是用for循环代替Python脚本(显然)生成猪拉丁代码,替换某个键话。不过,这仍然很肮脏。