Secondary排序Hadoop

我正在做一个hadoop项目，经过多次访问各种博客和阅读文档后，我意识到我需要使用hadoop框架提供的secondry排序功能。Secondary排序Hadoop

我的输入格式的形式为：

DESC(String) Price(Integer) and some other Text

我想在减速值是降价格秩序。同时比较DESC我有一个方法需要两个字符串和一个百分比，如果两个字符串之间的相似性等于或大于百分比，那么我应该认为它们是相等的。

问题是Reduce作业完成后我可以看到一些类似于其他字符串的DESC，但它们在不同的组中。

以下是分组的复合键

public int compareTo(VendorKey o) { 
    int result =- 
    result = compare(token, o.token, ":") >= percentage ? 0:1; 
    if (result == 0) { 
     return pid> o.pid ?-1: pid < o.pid ?1:0; 
    } 
    return result; 
}

和比较的方法我compareTo方法比较

public int compare(WritableComparable a, WritableComparable b) { 
    VendorKey one = (VendorKey) a; 
    VendorKey two = (VendorKey) b; 
    int result = ClusterUtil.compare(one.getToken(), two.getToken(), ":") >= one.getPercentage() ? 0 : 1; 
    // if (result != 0) 
    // return two.getToken().compareTo(one.getToken()); 
    return result; 
}

来源

2016-08-04 Abhishek Singh

修复了compareTo方法吗？ – aventurin

看来你compareTo方法违反了共同contract需要sgn(x.compareTo(y))等于-sgn(y.compareTo(x)) 。

来源

2016-08-06 17:32:13 aventurin

在您的customWritable之后，给一个基本分区程序提供组合键和NullWritable值。例如：

public class SecondarySortBasicPartitioner extends 
    Partitioner<CompositeKeyWritable, NullWritable> { 

    public int getPartition(CompositeKeyWritable key, NullWritable value, 
      int numReduceTasks) { 

     return (key.DEPT().hashCode() % numReduceTasks); 
    } 
}

然后，在指定键排序比较器和2个compositeKeyWritable变量之后，将完成分组。

来源

2017-04-22 15:54:47

有洗牌过程中3个步骤：分区，排序和分组。我想你有多个reducer，你的类似结果由不同的reducer处理，因为他们在不同的分区。

您可以将reducer的数量设置为1或设置一个自定义的分区程序，它可以为您的作业扩展org.apache.hadoop.mapreduce.Partitioner。

来源

2018-01-10 09:51:53 Harper

Secondary排序Hadoop

回答

相关问题