previous up next contents
Previous: 2 and Up: Revisiting the test Next: 4 Fair die rolling   Contents

Subsections


3 Unfair die rolling

3.1 Simulating an unfair die

In this paper, no questions about random uniform generators will be asked. The question of the quality of such generators is the subject of a huge amount of literature [2,3] and, for what we are doing, a congruential generator like the Maple's one, described ALG. 1, can be considered as safe. As it should be, the modulus \( \theta \) is prime, while the multiplier \( a \) is a primitive root for that modulus. To shorten notations, rand(1..n)() will denote a generator providing integers uniformly distributed inside \( \left\{ 1,\, 2,\, \cdots ,\, n\right\} \), implemented as \( 1+rand(\, )\, \mathrm{mod}\, n \), and randu() will denote an uniform generator over \( \left[ 0,\, 1\right] \), implemented as \( \left( rand\left( \, \right) +0.5\right) /\theta \).


\begin{algorithm}
% latex2html id marker 191\begin{algorithmic}[1]
\par\DEF { ...
...nd{algorithmic}
\par\caption{Maple's uniform random generator.
}\end{algorithm}

To describe an unfair die, we have to chose the number of faces (say \( F=11 \)) and the probabilities for each face. To avoid "poorly chosen rational numbers" (the key point of Section 4), the odds for the different faces have been chosen (once for ever) proportional to numbers obtained as \( \alpha _{i}=3+randu(\, ) \), i.e. not equal, but not too different. The simulation of what happens when rolling that die can be obtained by a rejection scheme, as described in ALG.  2 where the rejection thresholds \( \beta _{i} \) have been defined by \( \beta _{i}=\theta \times \alpha _{i}/\max \left( \alpha _{j}\right) \). Roughly speaking, the last digit of \( seed \) is used to make a proposal for \( k \) and the other digits are used to approve or reject that proposal. A more "paranoïd" implementation will use a call to rand() for the proposal, and another call for the decision (doubling the cost).

A more efficient implementation will use the Walker scheme as described in [1], dividing the cost by the rejection ratio, here about \( 1.1 \). Moreover, this implementation uses exactly one call to the congruential generator by die throw, allowing an easier parallelization of the computations. To undertake \( \delta \times \gamma \) die throws by using \( \gamma \) computers, you can start with a given \( seed=seed_{0} \) on the first one, together with \( seed=irem\left( a^{\delta }\times seed_{0},\, \theta \right) \) on the second one, \( seed=irem\left( a^{2\, \delta }\times seed_{0},\, \theta \right) \) on the third and so on.


\begin{algorithm}
% latex2html id marker 216\begin{algorithmic}[1]
\par\DEF { ...
...rithmic}
\par\caption{Rejection scheme for unfair die rolling.
}\end{algorithm}

3.2 Samples

Let us call sample the result of a fixed number \( n \) of die throws, i.e. an ordered set of natural integers \( X_{1},\, X_{2},\, \cdots ,X_{F} \) each \( X_{i}\) denoting how many times the face labeled \( i \) has appeared. Obviously, \( X_{1}+X_{2}+\cdots +X_{F}=n \), while the expected values \( \overline{X_{i}} \) of these \( X_{i}\) are given by \( \overline{X_{i}}=n\times \alpha _{i}/\sum \alpha _{j} \). We have chosen \( n=119 \) and the \( \overline{X_{i}} \) given FIG. 2 to exemplify our paper. according to these odds, the random variable "a die throw" has expectation, variance and fourth momentum given by :

\begin{displaymath}
\mu \approx 5.9989\quad ;\quad \sigma ^{2}\approx 9.8432\quad ;\quad \mathcal{M}_{4}=173.6906\end{displaymath}

FIG.  2: Expectations of the \( X_{j}\).
\begin{figure}{\centering\begin{tabular}{\vert c\vert c\vert c\vert c\vert}
\hli...
...
\( \overline{X_{11}}=10.097 \)&
\\
\hline
\end{tabular}\par }
\par\end{figure}

FIG. 3 examplifies such a sample and the statistics that can be extracted from it. Namely the mean value of the die scores, i.e. \( mea\_s \doteq \frac{1}{n}\sum k\, X_{k} \), the variance predictor, i.e. \( var\_s \doteq \frac{1}{n-1}\sum k^{2}\, X_{k}-\frac{n}{n-1}\left( mea\_s \right) ^{2} \) and the \( Pearson's\, \chi ^{2}\) of the experiment, i.e. \( chi\_s \doteq \sum \frac{\left( X_{i}-\overline{X_{i}}\right) ^{2}}{\overline{X_{i}}} \).

FIG.  3: The \( X_{i}\) (grey) and their expectations (light grey).

\begin{displaymath}
X=\left[ 10,\, 6,\, 13,\, 11,\, 15,\, 12,\, 9,\, 13,\, 9,\, 8,\, 13\right] \end{displaymath}

\resizebox*{7.5cm}{5cm}{\includegraphics{figures/119_batch.eps}}

\( mea\_s =6.0756 \), \( var\_s =9.4603 \) and \( chi\_s =6.7032 \).

3.3 Days

These three statistics can be considered as three new random variables, having their own distribution (over the set of all possible samples). The dispersion parameters of \( mea\_s \) are well-known to be given by :

\begin{displaymath}
\mathrm{E}\left( mea\_s \right) =\mathrm{E}\left( X \right) ...
...frac{1}{n}\mathrm{var}\left( X \right) =\frac{1}{n}\sigma ^{2}
\end{displaymath} (4)

The result \( \mathrm{E}\left( var\_s \right) =\sigma ^{2} \) is the reason why the statistic \( var\_s \) has been chosen (instead of the variance of the sample itself), while the \( \displaystyle \mathrm{var}\left( var\_s \right) \)' formula is revisited in anx: variance_of_sigma2. Finally, the (exact) formulae involving \( \mathrm{E}\left( chi\_s \right) \) and \( \mathrm{var}\left( chi\_s \right) \) are restated in A.3(page [*]).

We have the following table :


Table 1: Unfair die rolling : results of a given sample and their reduced values.
\( variable\: \mathbf{V} \) \( \mathbf{V} \) ( FIG. 3) \( \mathrm{E}\left( \mathbf{V} \right) \) \( \mathrm{var}\left( \mathbf{V} \right) \) \( \frac{\mathbf{V}-\mathrm{E}\left( \mathbf{V} \right) }{\sqrt{\mathrm{var}\left( \mathbf{V} \right) }} \)
\( mea\_s \) \( 6.0756 \) \( \mu =5.9990 \) \( \, \, .0827\quad \) (EQ. 4) \( .2665 \)
\( var\_s \) \( 9.4603 \) \( \sigma ^{2}=9.8432 \) \( \, .6592\quad \)(EQ. 5) \( -.4716 \)
\( chi\_s\) \( 6.7032 \) \( F-1=10 \) \( 19.837\quad \)(EQ. 8) \( -.7372 \)


In order to examine the effective distribution of these statistics, it is convenient to continue to roll the die and obtain what will be referred as a day, i.e. a collection of a given number (say \( M\)) of a samples. FIG.  4 displays the histograms of the values obtained for these statistics during the \( M=853 \) samples of a given day, as well as the usual theoretical models for their repartition.

FIG.  4: Distributions over a day of three statistics.
\resizebox*{5cm}{5cm}{\includegraphics{figures/bmea_dist.eps}} \resizebox*{5cm}{5cm}{\includegraphics{figures/bvar_dist.eps}} \resizebox*{5cm}{5cm}{\includegraphics{figures/bchi_dist.eps}}

The compatibility of these models (normal, normal and \( \chi _{10}^{2} \)) with the experimental results can be eye-checked in FIG.  4, and also tested in respect to their mean, variance and \( \chi ^{2} \) behavior. The results of these tests are summarized Table 2, where \( \mathrm{E}\left( \mathbf{V} \right) \) is the expectation, \( mean\left( \mathbf{V}\right) \) the experimental mean, \( \mathrm{var}\left( \mathbf{V} \right) \) the variance and \( s^{2}\left( \mathbf{V}\right) \) the experimental predictor of variance. Nothing but expected happens : the central limit theorem is acting over \( mea\_s \), and chi-square is chi-square distributed !


Table 2: Unfair die rolling : results of a given day.
\( \mathbf{V} \) \( \mathrm{E}\left( \mathbf{V} \right) \) \( mean\left( \mathbf{V}\right) \) \( \mathrm{var}\left( \mathbf{V} \right) \) \( s^{2}\left( \mathbf{V}\right) \) model \( \left( \chi ^{2}\right) _{reduced}\)
\( mea\_s \) \( \mu =5.9989 \) \( 5.9992 \) \( .0827 \) \( .0789 \) normal 19 df \( -0.0718 \)
\( var\_s \) \( \sigma ^{2}=9.8432 \) \( 9.8421 \) \( .6592 \) \( .6712 \) normal 22 df \( 1.3006 \)
\( chi\_s\) \( F-1=10 \) \( 9.9127 \) \( 19.837 \) \( 19.284 \) \( \chi ^{2} \) 15 df \( -0.2612 \)


3.4 Months

The positive result of the preceding paragraph can be enforced, by a careful examination a succession of days... a month. Let us formalize the computation leading to number \( -0.2612 \) downright Table 2. The number of bars of the histogram will be fixed to \( B=14 \), and the range for \( chi\_s\) divided into quite equiprobable sub-intervals. Here again, "poorly chosen rational numbers" are avoided by choosing randomly the odds of these intervals (formula 4+randu() has been used to obtain the probabilities listed in Table 3). Thereafter, the inverse \( \chi ^{2} \) cdf (with \( F-1 \) df) is used to obtain the boundaries of each interval.


Table 3: Partition of the \( chi\_s\) range.
proba \( .0783 \) \( .0746 \) \( .0659 \) \( .0763 \) \( .0675 \) \( .0677 \) \( .0731 \)
starts \( 0. \) \( 4.5059 \) \( 5.6085 \) \( 6.3940 \) \( 7.2169 \) \( 7.9128 \) \( 8.6085 \)
proba \( .0652 \) \( .0796 \) \( .0647 \) \( .0771 \) \( .0656 \) \( .0777 \) \( .0666 \)
starts \( 9.3794 \) \( 10.1050 \) \( 11.0749 \) \( 11.9702 \) \( 13.2512 \) \( 14.6756 \) \( 17.3669 \)




For each day, the \( M\) instantiations of \( chi\_s\) are distributed in the intervals, \( Y_{0} \) of them falling in \( \left[ 0,\, 4.5059\right] \) and so on, finishing with \( Y_{13} \) of them falling into \( \left[ 17.3669,\, \infty \right] \) . The new random variable \( chi\_d \) is defined as the chisquare of these \( Y_{j} \)

FIG.  5: Repartition of the \( chi\_s\) into \( B\) classes.
\resizebox*{5cm}{5cm}{\includegraphics{figures/atrick_05.eps}}

05
\( cut\_chi=14,\, \chi ^{2}=13.665405261963729791,\, {\left( \chi ^{2}\right) _{reduit}}=-.063232461916769653052 \)


previous up next contents
Previous: 2 and Up: Revisiting the test Next: 4 Fair die rolling   Contents


douillet@ensait.fr
2002-10-01