我的目标是测试一个类是否将其某个属性设置为随机整数值。我在网上发现了一个卡方检测算法,并决定将其投入使用。我对结果感到非常惊讶:样本量越大,测试似乎越不可能通过。我应该说,我绝不是一名统计专家(我可能会问这个问题可能是不言而喻的),所以我可能会在这里弄错一些问题。单元测试随机数java
只在最终的变量int SIZE
(在UserTest
)变化的测试结果。每个测试运行30次:
SIZE avg results
11 25.4 26, 25, 22, 24, 30
20 25 26, 26, 24, 22, 27
30 24 24, 22, 24, 26, 24
100 19.4 17, 23, 20, 18, 19
200 16.2 15, 18, 18, 15, 15
1000 13.2 13, 13, 14, 13, 13
10000 10 14, 7, 8, 10, 11
虽然这不是绝对必要的,让我有在这方面真正的随机性,我还是好奇,是什么问题。这是一个错误的算法本身,我错误地使用它,“使测试变得更加困难”的自然结果(统计noob,记住),还是我推动对Java的伪随机发生器的边界?
域类:
public class User
{
public static final int MINIT = 20;
public static final int MAXIT = 50;
private int iterations;
public void setIterations()
{
Random random = new Random();
setIterations(MINIT+random.nextInt(MAXIT-MINIT));
}
private void setIterations(int iterations) {
this.iterations = iterations;
}
}
测试类:
public class UserTest {
private User user = new User();
@Test
public void testRandomNumbers() {
int results = 0;
final int TIMES = 30;
for(int i = 0; i < TIMES; i++)
{
if (randomNumbersRun())
{
results++;
}
}
System.out.println(results);
Assert.assertTrue(results >= TIMES * 80/100);
}
private boolean randomNumbersRun()
{
ArrayList<Integer> list = new ArrayList<Integer>();
int r = User.MAXIT - User.MINIT;
final int SIZE = 11;
for (int i = 0; i < r*SIZE; i++) {
user.setIterations();
list.add(user.getIterations());
}
return Statistics.isRandom(list, r);
}
}
卡方算法:
/**
* source: http://en.wikibooks.org/wiki/Algorithm_Implementation/Pseudorandom_Numbers/Chi-Square_Test
* changed parameter to ArrayList<Number> for generalization
*/
public static boolean isRandom(ArrayList<? extends Number> randomNums, int r) {
//According to Sedgewick: "This is valid if N is greater than about 10r"
if (randomNums.size() <= 10 * r) {
return false;
}
//PART A: Get frequency of randoms
Map<Number, Integer> ht = getFrequencies(randomNums);
//PART B: Calculate chi-square - this approach is in Sedgewick
double n_r = (double) randomNums.size()/r;
double chiSquare = 0;
for (int v : ht.values()) {
double f = v - n_r;
chiSquare += f * f;
}
chiSquare /= n_r;
//PART C: According to Swdgewick: "The statistic should be within 2(r)^1/2 of r
//This is valid if N is greater than about 10r"
return Math.abs(chiSquare - r) <= 2 * Math.sqrt(r);
}
/**
* @param nums an array of integers
* @return a Map, key being the number and value its frequency
*/
private static Map<Number, Integer> getFrequencies(ArrayList<? extends Number> nums) {
Map<Number, Integer> freqs = new HashMap<Number, Integer>();
for (Number x : nums) {
if (freqs.containsKey(x)) {
freqs.put(x, freqs.get(x) + 1);
} else {
freqs.put(x, 1);
}
}
return freqs;
}
}
在ideone上运行代码时,我无法重现您的结果。打印的号码用'TIMES'([link](http://ideone.com/afo86g))表示。你能检查我是否缺少任何东西吗?这很可能取决于平台。 – dasblinkenlight
您应该在randomNumbersRun()中变量SIZE,而不是testRandomNumbers()中的TIMES。 – blagae
好的,现在我得到的数字大致相同,无论大小如何([link](http://ideone.com/afo86g))。你可以在你的平台上运行一个小实验吗?看看我的代码链接,并通过使随机随机静态(取消注释行16,并删除我的代码中的第20行)更改您的用户类。看看它是否改变了结果。如果是这样,我想我知道可能会发生什么。 – dasblinkenlight