2016-02-03 48 views
0

下面是我输入如何从猪的分组关系中找到最大值及其相关值?

$ cat people.csv 
Steve,US,M,football,6.5 
Alex,US,M,football,5.5 
Ted,UK,M,football,6.0 
Mary,UK,F,baseball,5.5 
Ellen,UK,F,football,5.0 

我需要组根据国家我的数据。

people = LOAD 'people.csv' USING PigStorage(',') AS (name:chararray,country:chararray,gender:chararray, sport:chararray,height:float); 
grouped = GROUP people BY country; 

现在我必须从分组数据中找到人的最大高度和他的详细信息。

所以我想下面

a = FOREACH grouped GENERATE group AS country, MAX(people.height) as height, people.name as name; 

这给输出

(UK,6.0,{(Ellen),(Mary),(Ted)}) 
(US,6.5,{(Alex),(Steve)}) 

但我需要我的输出应该是

(UK,6.0,Ted) 
(US,6.5,Steve) 

可能有人请帮助我实现这个?

回答

0

此代码将帮助您。

按照这个代码,如果有同一个国家下两名球员,最大高度,那么你将获得这两个球员详细

records = LOAD '/home/user/footbal.txt' USING PigStorage(',') AS(name:chararray,country:chararray,gender:chararray,sport:chararray,height:double); 

records_grp = GROUP records BY (country); 

records_each = foreach records_grp generate group as temp_country, MAX(records.height) as max_height; 

records_join = join records by (country,height), records_each by (temp_country,max_height); 

records_output = foreach records_join generate country, max_height, name; 

dump records_output; 

输出:

(UK,6.0,Ted) 
(US,6.5,Steve) 
+0

它worked..Thanks一个很多! – Sathyaraj