后加入和GROUP BY我在新的猪,并试图理解为什么我不能指望后加入和组:COUNT猪
A = LOAD 'mary' as (line);
B = LOAD 'mary' as (line);
wordsA = foreach A generate flatten(TOKENIZE(line)) as wordA;
grpdA = group wordsA by wordA;
cntdA = foreach grpdA generate group, COUNT(wordsA);
wordsB = foreach B generate flatten(TOKENIZE(line)) as wordB;
grpdB = group wordsB by wordB;
cntdB = foreach grpdB generate group, COUNT(wordsB), 'some text';
fltB = FILTER cntdB BY $1>1;
jnd = join cntdA by $1, fltB by $1;
jnd_n = foreach jnd generate $0;
grp = group jnd by $0;
out = foreach grp generate group, count(jnd_n);
dump jnd_n;
dump grp;
转储jnd_n:
(was)
(was)
(was)
(lamb)
(lamb)
(lamb)
(Mary)
(Mary)
(Mary)
转储GRP :
(was,{(was,2,was,2,some text),(was,2,Mary,2,some text),(was,2,lamb,2,some text)})
(Mary,{(Mary,2,was,2,some text),(Mary,2,Mary,2,some text),(Mary,2,lamb,2,some text)})
(lamb,{(lamb,2,was,2,some text),(lamb,2,Mary,2,some text),(lamb,2,lamb,2,some text)})
但我发现了错误:
Invalid scalar projection: jnd_n : A column needs to be projected from a relation for it to be used as a scalar
如果我试图改变代码:
out = foreach grp generate group, count(jnd_n.$0);
然后我发现了另一个错误:
Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve count using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
我知道我能做到这一点的另一种方式,但我想导致像这正是后做了两次猪手术后JOIN和GROUP BY:
转储:
(was,3)
(lamb,3)
(Mary,3)
THX的答案,是COUNT是敏感的,却得到了另一个错误: '错误org.apache.pig.tools.grunt.Grunt - 错误1045: <文件script.pig,24行,列34>无法推断org.apache.pig.builtin.COUNT的匹配函数为多个或不匹配。请使用明确的演员。“ – Dipas