In general, the quality of a pseudorandom number generator is based on a specific theoretical framework which provides the assumptions and requirements which lead to a series of theoretical results which demonstrate various quality aspects such as long period and unpredictability.
Although mathematical evidence should be sufficient, a good generator implementation should be able to demonstrate not only in theory but in practice, as well, specific quality aspects. For this reason, the design and implementation of pseudorandom number generation algorithms should, also, be accompanied by the results of statistical or randomness tests which attest to the quality properties of the algorithms or, at least, their specific implementations.
As we discussed in the Introduction section, some quality properties of random number generators are hard to define and evaluate using practical, i.e. finite and computable, approaches. For instance, unpredictability and randomness itself are elusive concepts with various theoretical characterizations most of which not leading to practical, computable, tests. After all, algorithms and computers are deterministic entities whose time evolution follows a well-defined sequence of steps in order to produce results.
4.1. Practical Statistical Tests
In this section, we will present a number of statistical tests described in Knuth’s book [
12]. Most of the tests, however, suit mainly truly random number sequences while some of the test parameters are not provided numerically. Thus, we follow the modifications of these tests as described in [
13] which are appropriate for integer number sequences while specific parameter values are provided for the tests.
In particular, the basic theory of tests for
real number sequences is provided in [
12]. The proposed applicability criteria are, also, suitable for integer sequences as well in some tests. However, integer sequences cannot be tested since the same assumptions as for real number sequences since they may not be applicable to them. For instance, the
Permutation Test gives test probabilities based on the premise that no two consecutive terms of the tested sequence can be equal. However, for integer sequences, equality of consecutive terms may occur with a non-negligible probability. Thus, in [
13] new combinatorial calculations were developed in order to adapt Knuth’s test suite to integer and binary sequences. Additionally, some of the required parameters for the tests, such as sequence length, alphabet size, block size, and other similar parameters, are, also, provided in [
13]. Taking into account the aforementioned issues, the authors of [
13] provide the test details and required probability calculations for testing binary and integer sequences, which are of interest for our purposes.
Each of the tests is applied, in general, to a sequence of real numbers which, assumedly, have been chosen independently from the uniform distribution in the interval . However, some of the tests are designed for sequences of integers rather than sequences of real numbers. If this is the case, we form the auxiliary integer sequence , where . Accordingly, this sequence has been, assumedly, produced, by independent choices from the uniform distribution over the integers . The parameter d should be sufficiently large so as the tests’ results are meaningful for our PRNGs (e.g. the ’s fall in the PRNG’s range) but not so large as to render the test infeasible in practice.
Below, we discuss some of the tests in Knuth’s book [
12]. One may, also, consult their modified form provided in [
13]. We first discuss two fundamental tests which are used in combination with other empirical test that we describe afterwards.
4.1.1. The “Chi-Square” Test
In probability theory and statistics, the “chi-squared” distribution, also called “chi-square” and denoted by , with k degrees of freedom is defined as the distribution of the sum of the squares of k standard normal random variables. The “chi-square” distribution of k degrees of freedom is, also, denoted by or . The “chi-squared” distribution is one of the most frequently used probability distributions in statistics, e.g. in hypothesis testing and in the construction of confidence intervals.
In our case, the “chi-square” distribution is used to evaluate the “goodness-of-fit” of the monitored frequencies of a sequence of observations to the expected frequencies of the distribution under test (this is the hypothesis). The test statistic (stochastic distribution) is of the form , where and are the observed and expected frequencies of occurrence, correspondingly, of the observations. This Chi-Square test is one of the most widespread statistical tests of a sequence of data derived from observations of a phenomenon, in general. In our case, this phenomenon is the source of pseudorandom numbers and our data is the generated numbers.
We assume the following:
According to this test, we perform
n independent observations from the source of data (
n should be large enough). In our case, the observations will be the
pseudorandom numbers. Then, we count how many times each number appeared. Let us assume that the number
s appeared
times. Also, let
be the probability of appearance of the number
s. We calculate the following statistical indicator (we assume that the numbers generated by the pseudorandom number generator are within the range
):
Next, we compare V to the entries of the distribution tables of with the parameter k (the degree of freedom) equal to . If V is less than the table entry corresponding to 99% or greater than the entry corresponding to 1%, then we do not have sufficient randomness. If it is between the entries of 99% and 95%, then insufficient randomness may exist. A value below 95% is a good indication that the numbers under testing are close to random.
Knuth suggests applying the test for a sufficiently large values of n. Also, for test reliability purposes, he suggests, as a rule of thumb, that n should be sufficiently large to render the expected values of greater than 5. However, a large value of n presents some undesirable properties, such as locally non-random behavior through, for instance, an entrance into a cycle in the sequence. This fact is addressed not by using smaller value for n but by applying the test for several different large values of n.
4.1.2. The Kolmogorov-Smirnov Test
The “Kolmogorov-Smirnov” test measures the maximum difference between the expected and the actual distribution of the given number sequence. In simple terms, the test checks whether a dataset (a PRNG sequence in our case) comes from a particular probability distribution.
In order to approximate the distribution of a random variable
X, we target the
distribution function , where
. If we have
n different independent observations of
X then from the values corresponding to these observations
we can, empirically, approximate the function
as follows:
To apply the “Kolmogorov-Smirnov” test we use the following statistical quantities:
where
measures the maximum deviation when
is greater than
F and
measures the maximum deviation when
is less than
F.
The test applies, subsequently, the following steps, based on these functions:
First, we take n independent observations corresponding to a certain continuous distribution function .
We rearrange the observations so that they occur in non-descending order .
The desired statistical quantities are given by the following formulas:
After we have calculated the quantities and ,we compare them to the values in the test’s tables in order to decide whether the given sequence is uniformly random or not. Knuth recommends to apply the test with using only two decimal places of precision.
Further to these two fundamental tests, we give below some more empirical tests which are used in conjunction with them.
4.1.3. Equidistribution or Frequency (Monobit) Test
This test checks if number of occurrences of each element, i.e. number produced by the PRNG, a is as it would be expected from a random sequence of elements. The Monobit case examines bits, i.e. it is applied on bit sequences but it can, also, be applied to any number range. Knuth suggests two methods for applying the test:
Using the “Kolmogorov-Smirnov” test with distribution function , for .
For each element a, , we count the number of occurrences of a in the given sequence and then we apply the “chi-square” test with degree of freedom , and probability for each element (“bin”).
4.1.4. Serial Test
The serial test is, actually, an Equidistribution test for pairs of elements of the sequence, i.e. for alphabet (element) size . Thus, the serial test checks that the pairs of numbers are uniformly distributed. For PRNGs with binary output, for instance, where , the test checks whether the distribution of the pairs of bits, i.e. is as expected.
The test is applied in the following way:
We count the number of times that pairs occur, for , .
We apply the “Chi-Square” test with degrees of freedom and probability for each category (i.e. pair).
For the application of the test, the following considerations apply:
The value of n should be, at least, .
The test can also be applied to groups of triples, quadruples, etc. of consecutive generator values.
The value of d must be limited in order to avoid the formation of many categories.
4.1.5. Gap Test
This test counts the number of elements, or gap, that appear between successive appearances of particular elements in the sequence and then uses the Kolmogorov-Smirnov test to compare with the expected, from a random sequence, number of gaps. In other words, the test checks whether the gaps between specific numbers follows the expected distribution. In Knuth’s book, the test is defined for sequences of real numbers by examining the length of gaps between occurrences of over a specific range of these elements. As we discussed above, the test can easily transformed into a test for integer values.
In particular, if a, b two real numbers with , our goal is to examine the lengths of consecutive subsequences such that is between a and b while all the other values are not. Thus, this subsequence of exhibits a gap of length r. In what follows, we give the algorithm that, given a, b and a sequence of real numbers, counts the number of gaps, as defined above, with lengths ranging over as well as the number of gaps of length at least t, until n gaps have been computed.
Note that the algorithm terminates only when n gaps have been located (see step 6).
After the algorithm has terminated, we will have calculated the number of gaps of lengths
and and of lengths at least
t, in the array variables
We can, now, apply the “Chi-Square” test with
degrees of freedom using the, expected from a random sequence, probabilities below:
In the probabilities in (
8), we set
which is the probability of the event
. As stated in [
12], the values
are selected so that
is expected to be at least 5 (preferably more than 5).
4.1.6. Poker Test
The Poker test proposed by Knuth involves checking n groups of five successive elements , , with respect to whether one of the following seven element patters appears in them, where ordering does not matter:
All 5 elements are distinct.
There is only one pair of equal elements.
There are two distinct pairs of equal elements each.
There is only one triple of equal elements.
There is a triple of equal elements and a pair of equal elements, different from the element in the triple.
There is one one quadruple of equal elements.
There is a quintuple of equal elements.
In other words, we study the distinctness of the numbers in each group of five elements. Next, we apply the “Chi-Square” test for the number of quintuples in the n groups 5 consecutive elements that fall within each of the 7 categories defined above.
We can generalize the reasoning of the test discussed above by considering
n groups of
k successive elements, instead of 5. Then we can calculate the number of
k-tuples of successive elements which have
rdistinct values. The probability
for this event is given by the following equation (see [
12] for the proof):
See Equation
6 for the computation of
, i.e. the Stirling numbers of the second kind.
4.1.7. Coupon Collector’s Test
Using the sequence , the Coupons collector’s test computes the lengths of the segments required to obtain the complete set of integers from 0 up to .
-
Given the PRNG sequence , where , we count the lengths of n consecutive “coupon collector” samplings using the algorithm that follows. In the algorithm, is the number of segments of length r, , while is the number of segments of length at least t.
Algorithm description:
-
After the algorithm has counted n lengths, we apply the “Chi-Square” test to
(i.e. the number of coupon collection steps of length
r) with
(degrees of freedom). The propabilities that correspond to these events are the following (see [
12] for the derivation):
4.1.8. Permutation Test
The Permutation test calculates the frequency of appearance of em permutations, i.e. different arrangements, of successive elements in a given number sequence.
More specifically, we divide the numbers of the given sequence into n groups of t elements each, i.e. we form the sets of t-tuples , . For each t-tuple we may have possible permutations or categories. The test counts the frequency of appearance of each such permutation.
Note that for this test it is assumed that all numbers are distinct. This is justifiable if the
’s are real numbers (since the probability of equality of two real numbers is zero) but not justifiable for integer sequences. See [
13] for a discussion of how to alleviate this assumption for the integer sequences of PRNGs.
The frequency of appearance of each permutation or category is calculated by the
algorithm P below (see [
12] for more details). The algorithm is given a sequence
of
distinct elements. Then it computes an integer
for which the following is satisfied:
and
if and only if
and
have the same relative ordering.
The steps of the algorithm are the following:
Finally, we apply the “Chi-Square” test for
degrees of freedom with probability of each permutation (category) equal to
(see [
12] for more details).
4.1.9. Run Test
The Run test checks the lengths of maximal monotone subsequence of the given sequence, i.e. monotonically increasing or decreasing or “runs-up” and “runs-down” respectively. In other words, we consider the length of these monotone runs.
Note, however, that in the Run test we cannot not apply the “Chi-Square” test to the lengths of the monotone runs since they are, in general, not independent. Usually, a long run is followed by a short run etc. In order to handle this difficulty we apply the following procedure, which takes as input a sequence of distinct real numbers:
The same test can be applied to “runs-down”.
4.1.10. The Maximum t Test
The Maximum t test checks if the distribution of the maximum of t random numbers is as expected. The test works as follows, iterating over n subsequences of t values:
We take the maximum value of the given t numbers, i.e. for all n subsequences, , .
We apply the “Kolmogorov-Smirnov” test to the sequence of maximums with distribution function , . As an alternative, we can apply the Equidistribution test to the sequence .
The verification of the test is to show that the distribution function of the ’s is . This is beacause the probability of the event is equal to the probability of the independent events which is equal to the product of the individual probabilities, all equal to x, which gives .
4.1.11. Collision Test
The Collision test checks the number of collisions produced by the elements of a given number sequence. We want the number of collisions to be neither too high nor too low. Below, we explain the term collision and highlight the workings of the tetst.
The general experiment we consider is throwing balls in bins at random. When two balls appear in a single bin, we have a collision. In our case, the bins are the test categories and the balls are the sequence elements or observations to be placed into the bins or categories. In the Collision test the number of categories is assumed to be much larger than the number of observations (i.e. distinct numbers in the sequence). Otherwise, we can use direct “Chi-square” tests on the categories as before.
More specifically (see [
12]) let us fix, for concreteness, the number of bins or categories
m to be
and the number of balls or observations
n to be equal to
. In our experiment, we throw at random these
n balls into the
m bins. To this end, we will convert the
U sequence of real numbers into a corresponding
Y sequence of integers for an appropriate choice of
d (see discussion in the beginning of
Section 4.1).
In this example, we will evaluate the generator’s sequence in a 20-dimensional space using , i.e. forming vectors of 20 elements 0 or 1. Each vector j has the form , with .
Given
m and
n, the test uses an algorithm (see [
12] to determines the distribution of the number of collisions caused by the
n balls (20-dimensional vectors) when they are placed, at random, into the
m bins. The corresponding probabilities are provided similarly as in the Poker test. The probability that
c collisions occur in a bin is the probability that
bins are occupied, which is given by
Since
n and
m are very large, it is not easy to compute these probabilities using the definition of the Stirling numbers of the second kind as given by Equation
6. Knuth has proposed a simple algorithm, called
Algorithm S (see [
12]) to approximate these probabilities through a process which simulates the placement of the balls into the bins.
4.1.12. Birthday Spacings Test
This test was proposed by G. Marsaglia in 1984 and it is included in the Diehard suite of tests. As in the Collision test, in the Birthday spacings test we randomly throw n balls into m bins. However, in this test the parameter m, the bins, represents the number of available “days in a year” and the parameter n represents “birthdays”, i.e. choices of days in a year.
We start by arranging the dates given
n birthdays into of birth in non-descending order. That is, if the birthdays are
, where
, they are sorted into non-decreasing order, say
. Then we define the successive spacings between birthdays
We, subsequently, rearrange the spacings into non-decreasing order, say
. Finally, we compute the distribution of the random variable
R which counts the number these spacings which are equal: it is defined as the number of indices
j,
, such that
. This distribution of
R depends on the specific values of
m and
n and we can focus on the cases
and at least 3 (see [
12]).
As suggested by Knuth, we repeat the test 1000 times and compare the distribution of R found empirically with this procedure with the theoretical distribution using the “Chi-square” test with degrees of freedom.
4.1.13. Serial Correlation Test
The idea behind the Serial Correlation test is to calculate the serial correlation coefficient, which is a statistical indicator of the degree to which the value of , of a real number sequence, depends on the previous value .
For a given sequence of numbers
, the serial correlation coefficient is given by the formula below:
The correlation coefficient always ranges from to 1. When C is 0 or close to 0, it indicates and are independent. If , and are totally linearly dependent. Thus, it is desirable to have C equal or close to 0.
However, since the values
and
are not, in fact, entirely independent of the values of
and
,
C is not expected to be exactly zero. As suggested by Knuth, a good value for
C is any value between
and
where
For a good PRNG, we expect C to lie between these values around of the time.