0
集分析我有一个电影数据库中的以下数据集:电影数据使用PIG
评分:用户ID,MovieID,评级::电影:MovieID,标题::用户:用户ID,性别,年龄
现在,我必须加入上述3个数据集,并确定哪部电影在女性中评分最高,男性中评分最低,反之亦然。 我也做了JOIN:
myusers = LOAD '/user/cloudera/movies/input/users.dat'
USING PigStorage(':')
AS (user:int, n1, gender:chararray, n2, age:int);
ratings = LOAD '/user/cloudera/movies/input/ratings.dat'
USING PigStorage(':')
AS (user:int, n1, movie:int, n2, rating:int);
movies = LOAD '/user/cloudera/movies/input/movies.dat'
USING PigStorage(':')
AS (movie:int,n1,title:chararray);
data = JOIN ratings BY user, myusers BY user;
data2= JOIN data BY ratings::movie, movies BY movie;
但毕竟这我遇到了许多问题,如“ERROR 0:标有在输出多行”,当我尝试从数据2打印列。任何想法来帮助我完成这项任务?