如何迭代JavaPairRDD。我已经完成了一个小组,并返回了一个RDD,如下所示:JavaPairRDD(Tuple 7字符串和对象列表)Spark JavaPairRDD迭代
现在我必须遍历此RDD并在Pig中执行一些计算,例如FOR EACH。 基本上我想迭代键和值的列表,并做一些操作,然后返回一个JavaPairRDD?
JavaPairRDD<Tuple7<String, String,String,String,String,String,String>, List<Records>> sizes =
piTagRecordData.groupBy(new Function<Records, Tuple7<String, String,String,String,String,String,String>>() {
private static final long serialVersionUID = 2885738359644652208L;
@Override
public Tuple7<String, String,String,String,String,String,String> call(Records row) throws Exception {
Tuple7<String, String,String,String,String,String,String> compositeKey = new Tuple7<String, String, String, String, String, String, String>(row.getAsset_attribute_id(),row.getDate_time_value(),row.getOperation(),row.getPi_tag_count(),row.getAsset_id(),row.getAttr_name(),row.getCalculation_type());
return compositeKey;
}
});
此我要为大小的每个成员(JavaPairRDD)执行后,操作 - 像
rejected_records = FOREACH sizes GENERATE FLATTEN(Java function on the List of Records based on the group key
我使用星火0.9.0
到目前为止,你能展示一些你的工作吗? – Anas 2014-09-30 12:43:59
@Anas - 更新我的评论 – 2014-09-30 15:40:33