2015-08-27 23 views
3

我正在尝试编写一个函数,它从一个集合中随机抽取元素并将它们添加到一个新元素中。因此,如果您想从{1,2,3,4,5}中绘制3个元素,您可以获得{5,3,4}。我想出了这个通用功能:使一个通用函数可以处理多个不重叠的类型

/** 
* Take a random sample without repeats of the specified size from a 
* collection. Note that if the collection you're sampling from contains 
* repeated elements, then the sample could also contain repeated elements. 
* Use a Set as an argument to avoid this. 
* 
* @param <T> The type of objects in the Collection 
* @param <E> The type of the collection 
* @param collection The collection 
* @param size The sample size of elements which you wish to extract from 
* the collection 
* @param factory A factory method for the collection E. Call with 
* "new::ArrayList" or something similar. 
* @return A random sample of the collection consisting of 'size' elements 
* without repeats (unless the original collection contained repeats). 
* @throws IllegalArgumentException if size is larger than the collection.size(). 
*/ 
public static <T, E extends Collection<T>> E drawRandomlyWithoutReplacement(List<T> collection, int size, Supplier<E> factory) { 
    if (size > collection.size()) { 
     throw new IllegalArgumentException("The sample size cannot be greater than the size of the collection."); 
    } 

    E list = factory.get(); 
    for (int i = 0; i < size; i++) { 
     int r = MathUtils.randomInt(0, collection.size() - 1); 
     list.add(collection.remove(r)); 
    } 
    return list; 
} 

不幸的是,Collection接口不具有从集合返回元素的功能,如果你删除它,但名单和Vector(等等)也有它。有没有一种方法可以使这个函数适用于列表和向量,而不必重载3次?我尝试使第一个参数是C,其中C extends List<T> | Vector<T>,但不幸的是这并没有工作。

+0

哪一个是用于'Set'? –

+0

你的JavaDoc说:“使用Set作为参数来避免这个[重复]”。所以不需要创建不同的方法,或者我错了吗? – Flown

+0

@DraganBozanovic,你是对的设置没有这个方法。 –

回答

1

collection成为任何集合(Collection<T> collection)。

E list = factory.get(); 
List<T> colAsList = new ArrayList<>(collection); 
for (int i = 0; i < size; i++) { 
    int r = MathUtils.randomInt(0, colAsList.size() - 1); 
    list.add(colAsList.remove(r)); 
} 
collection.removeAll(list); 
return list; 
1

如何名单,ArrayList中,矢量是子类,它定义

e取下(INT指数)//删除并返回

http://docs.oracle.com/javase/7/docs/api/java/util/List.html

感谢马尔科指点出来,我最初建议抽象清单

+1

'E remove(int)'方法[已经在'List'接口中定义!](https:// docs.oracle.com/javase/8/docs/api/java/util/List.html#remove-int-) – Marco13

+0

感谢您指出Marco,List是更好的向上表示 – whitecoffee

3

Collection接口不能做删除,但它的Iterator

public static <T, E extends Collection<T>> E drawRandomlyWithoutReplacement(Collection<T> collection, int size, 
    Supplier<E> factory) { 
    final int colSize = collection.size(); 
    if (size > colSize) { 
    throw new IllegalArgumentException("The sample size cannot be greater than the size of the collection."); 
    } else if(size == 0) { 
    return factory.get(); 
    } 

    Random rand = new Random(); 
    Set<Integer> sampleIndices = new TreeSet<>(); 
    while (sampleIndices.size() < size) { 
    sampleIndices.add(rand.nextInt(colSize)); 
    } 

    E result = factory.get(); 

    Iterator<T> collectionIterator = collection.iterator(); 
    Iterator<Integer> indexIterator = sampleIndices.iterator(); 
    int sampleIndex = indexIterator.next(); 
    for (int i = 0; i < colSize; i++) { 
    T sample = collectionIterator.next(); 
    if (i == sampleIndex) { 
     result.add(sample); 
     collectionIterator.remove(); 
     if (indexIterator.hasNext()) { 
     sampleIndex = indexIterator.next(); 
     } else { 
     break; 
     } 
    } 
    } 
    return result; 
} 
0

对于未知大小或元素的流(在你的情况迭代器)的集合存在的与您需要叫reservoir sampling确切purpouse算法的家庭。

水库采样是随机算法的一个家族,用于从包含n个项目的列表S中随机选择k个项目的样本,其中n是非常大或未知的数目。

以下算法以线性时间运行,并保证以所需概率挑选数字。

public static <T> List<T> reservoirSample(Iterable<T> items, int m){ 
    ArrayList<T> res = new ArrayList<T>(m); 
    int count = 0; 
    for(T item : items){ 
     count++; 
     if (count <= m) 
      res.add(item); 
     else{ 
      int r = rnd.nextInt(count); 
      if (r < m) 
       res.set(r, item); 
     } 
    } 
    return res; 
} 

此样本取自this blogpost,您可以在此进一步阅读有关该主题。

相关问题