2012-03-22 83 views
8

我正在尝试编写一个拉丁脚本来拉取我过滤的数据集的数量。无法推断COUNT函数

这里的脚本至今:

/* scans by title */ 

scans   = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray); 
productscans = FILTER scans BY (title MATCHES 'proactiv'); 
scancount  = FOREACH productscans GENERATE COUNT($0); 
DUMP scancount; 

出于某种原因,我得到的错误:

Could not infer the matching function for org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an explicit cast.

什么我错在这里做什么?我假设它与我传入的字段的类型有关,但似乎无法解决此问题。

TIA, 杰森

回答

14

这是你在找什么(所有组将一切准备一个袋子,然后计算项目):

scans   = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray); 
productscans = FILTER scans BY (title MATCHES 'proactiv'); 
grouped   = GROUP productscans ALL; 
count   = FOREACH grouped GENERATE COUNT(productscans); 
dump count; 
+2

就是这样(减去“FOREACH克”应该是“FOREACH分组”) - 谢谢克里斯! – JasonA 2012-03-23 14:02:56

+0

编辑,感谢您的审查 – 2012-03-23 14:32:18

0

也许

/* scans by title */ 

scans   = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray); 
productscans = FILTER scans BY (title MATCHES 'proactiv'); 
scancount  = FOREACH productscans GENERATE COUNT(productscans); 
DUMP scancount; 
+0

感谢杰克 - 不幸的是,没有运气。这给了我:'无效的标量投影:productscans:一列需要从一个关系投影,它被用作标量' – JasonA 2012-03-22 20:25:25

4

COUNT 需要前面的GROUP ALL语句用于全局计数和GROUP BY语句用于组计数。

您可以使用以下任何:

scans   = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray); 
productscans = FILTER scans BY (title MATCHES 'proactiv'); 
grouped   = GROUP productscans ALL; 
count   = FOREACH grouped GENERATE COUNT(productscans); 
DUMP scancount; 

或者

scans   = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray); 
productscans = FILTER scans BY (title MATCHES 'proactiv'); 
grouped   = GROUP productscans ALL; 
count   = FOREACH grouped GENERATE COUNT($1); 
DUMP scancount;