previous up next contents
Previous: 4 Fair die rolling Up: Revisiting the test Next: Bibliography   Contents

Subsections

A. Annexes


A..1 Variance of the estimator of variance

Let \( x_{1},\, \cdots \, \, x_{n} \) be \( n \) independent instantiations of a random variable with variance \( \sigma ^{2} \) and fourth centered moment \( \mathcal{M}_{4}\). Define the new r.v. \( var\_s \) (aka the variance estimator obtained from the given sample) by \( var\_s =\frac{1}{n-1}\sum _{j}\, x_{j}^{2}-\frac{1}{n\, \left( n-1\right) }\left( \sum _{j}\, x_{j}\right) ^{2} \). Then, taken over the set of all possible samples of a given size \( n\geq 2 \), we have :

\begin{displaymath}
\displaystyle \mathrm{E}\left( var\_s \right) =\sigma ^{2}\q...
...{\left( n-3\right) }{\left( n-1\right) }\, \sigma ^{4}\right)
\end{displaymath} (5)

A key point is that \( var\_s \) is invariant by the translation \( x_{j}=a+y_{j} \) where \( a \) denotes some constant : \( n\sum \left( x+a\right) ^{2}-\left( \sum \left( x+a\right) \right) ^{2}=n\sum x^{2}+2n\, a\sum x+n^{2}a^{2}-\left( n\, a+\sum x\right) ^{2} \). Therefore, nothing is changed if the r.v. \( x_{j} \) are assumed to be centered.

Now, using the theory of homogenous symmetrical polynomials, it can be shown that not only \( var\_s \) can be expressed as a polynomial expression into the elementary polynomials \( s_{1}=\sum \, y_{i} \) and \( s_{2}=\sum _{i\neq j}y_{i}y_{j}=\sum 'y_{i}y_{j} \) but also as a linear combination of \( \sum x_{i}^{2} \) and \( \sum 'x_{i}x_{j} \), i.e. quantities whose expectations are obvious, namely \( n\, \sigma ^{2} \) and \( 0 \). Therefore, we have \( \mathrm{E}\left( var\_s \right) =\alpha \left( n\right) \, \sigma ^{2} \), where \( \alpha \left( n\right) =\frac{num\left( n\right) }{n\left( n-1\right) } \) and \( num\left( n\right) \) is some polynomial whose degree is at most \( 1 \). Such a polynomial can be found by identification. A direct computation shows that \( \mathrm{E}\left( var\_s \right) \) takes values \( \sigma ^{2},\, \sigma ^{2},\, \sigma ^{2} \) when \( n \) takes values \( 2,\, 3,\, 4 \). Therefore \( \mathrm{E}\left( var\_s \right) =\sigma ^{2} \) is proven for all \( n\geq 2 \).

The second formula can be obtained by the same method. The quantity \( A=\left( var\_s -\sigma ^{2}\right) ^{2} \) is a degree \( 4 \) homogenous symmetrical polynomial in the \( x_{j} \)'s, and therefore a linear combination of \( \sum x_{i}^{4} \), \( \sum 'x_{i}^{3}x_{j} \), \( \sum 'x_{i}^{2}x^{2}_{j} \), \( \sum 'x_{i}^{2}x_{j}x_{k} \) and \( \sum 'x_{i}x_{j}x_{k}x_{l} \). Thus \( \mathrm{var}\left( var\_s \right) \) is a linear combination of the expectations of \( \sum x_{i}^{4} \) and \( \sum 'x_{i}^{2}x^{2}_{j} \), the other three having \( 0 \) as expectation. Thus \( \mathrm{var}\left( var\_s \right) =f\left( n\right) \mathcal{M}_{4}+g\left( n\right) \sigma ^{4} \), the coefficients \( f \) and \( g \) being rational fractions in \( n \) having the form \( \frac{num\left( n\right) }{n^{2}\left( n-1\right) ^{2}} \) where the degree of \( num\left( n\right) \) is at most \( 3 \).

Here again, these coefficients can be found by identification. A direct computation with \( 3\leq n\leq 15 \) gives

\begin{eqnarray*}
f\left( n\right) & = & \left[ \frac{1}{3},\, \frac{1}{4},\, \f...
...44},\, -\frac{5}{78},\, -\frac{11}{182},\, -\frac{2}{35}\right]
\end{eqnarray*}



The value \( \mathrm{f}\left( n\right) =\frac{1}{n} \) is straightforward, while \( \mathrm{g}\left( n\right) =-\frac{n-3}{\left( n-1\right) \, n} \) can be obtained by the "A=B" method [4] that leads to the following recurrence equation, where \( \gamma \left( n\right) \) stands for \( g\left( n+3\right) \) :

\begin{displaymath}
\left( 2+3\, n+n^{2}\right) \, \gamma \left( n\right) +\left...
...mma \left( 0\right) =0,\, \gamma \left( 1\right) =\frac{-1}{12}\end{displaymath}

It should be emphasized that these "guessing" are leading to rigorous proofs, since the existence in simple form has been proven.


A..2 About the \( \chi ^{2}\, law\)

The pdf of \( \zeta =z_{1}^{2}+\cdots +z_{\nu }^{2} \) where the \( z_{j} \) are independent Gaussian random variables is

\begin{displaymath}
\chi _{\nu }^{2}\left( \zeta \right) =C_{\nu }\, \zeta ^{\frac{1}{2}\nu -1}\exp \left( -\frac{1}{2}\zeta \right) \end{displaymath}

where \( C_{\nu } \) is a normalization constant, defined by \( \int _{0}^{\infty }\chi _{\nu }^{2}\left( t\right) \, \mathrm{d}t=1 \).

For \( \nu =1 \), this result comes from \( \chi _{1}^{2}\left( \zeta \right) \, \mathrm{d}\zeta =2\times \frac{1}{\sqrt{2\pi }}\exp \left( -\frac{z^{2}}{2}\right) \, \mathrm{d}z \) where the factor \( 2 \) is required since the correspondence \( \zeta \mapsto z \) is not single valued. For greater values of \( \nu \), this result comes from the convolution formula \( \chi _{\nu +1}^{2}\left( \zeta \right) =\int _{0}^{\zeta }\, \chi _{\nu }^{2}\left( t\right) \chi _{1}^{2}\left( \zeta -t\right) \, \mathrm{d}t\). We have \( \chi _{\nu +1}^{2}\left( \zeta \right) =Cte\times \int _{t=0}^{\zeta }\, t^{\...
...-\frac{1}{2}}\exp \left( -\frac{1}{2}\zeta +\frac{1}{2}t\right) \, \mathrm{d}t\). Changing \( t \) into \( z\, u \) where \( u\in \left[ 0,\, 1\right] \), we obtain \( \chi _{\nu +1}^{2}\left( \zeta \right) =\zeta ^{\frac{1}{2}\nu -\frac{1}{2}}\...
...0}^{1}\, u^{\frac{1}{2}\nu -1}\, \left( 1-u\right) ^{-\frac{1}{2}}\mathrm{d}u \) as required.

The value of \( C_{\nu } \) comes from the very definition of the Gamma function, and we have \( \frac{1}{C_{\nu }}=2^{\frac{\nu }{2}}\Gamma \left( \frac{\nu }{2}\right) \). Therefore \( \mathrm{E}\left( \zeta \right) =\frac{C_{\nu }}{C_{\nu +2}}=\nu \), \( \mathrm{E}\left( \zeta ^{2} \right) =\frac{C_{\nu }}{C_{\nu +4}}=\nu \left( \nu +2\right) \) leading to \( \mathrm{var}\left( \zeta \right) =2\nu \). Moreover, the modal value of a \( \chi ^{2} \) variable is the solution of \( \frac{\mathrm{d}^{ }}{\mathrm{d}\, \zeta ^{ }}\chi ^{2}\left( \zeta \right) =0 \), namely \( \zeta =\nu -2 \).


A..3 \( Pearson's\, \chi ^{2}\) and independent quadratic forms

The \( \displaystyle Pearson's\, \chi ^{2}\) has been defined in EQ. eq: def_pearson. Exemplifying with \( \nu =3 \), we obtain

\begin{displaymath}
Pearson's\, \chi ^{2}=\frac{\left( n_{0}-n\, p_{0}\right) ^{...
...n\, p_{2}}+\frac{\left( n_{3}-n\, p_{3}\right) ^{2}}{n\, p_{3}}\end{displaymath}

where \( n \) the number of trials, \( n_{j} \) the number of visits received by the \( j \)-th subset and \( n\, p_{j} \) the expectation of \( n_{j} \). Substituting the obvious relations \( p_{0}=1-\sum _{j=1}^{\nu }\, p_{j} \) and \( n_{0}=n-\sum _{j=1}^{\nu }\, n_{j} \) and "completing the squares", the \( Pearson's\, \chi ^{2}\) becomes :
\( \frac{1}{n\, \left( 1-p_{1}\right) \, p_{1}}\left( n_{1}-n\, p_{1}\right) ^{2...
...{3}\right) }\left( n_{3}-\frac{n-n_{1}-n_{2}}{1-p_{1}-p_{2}}p_{3}\right) ^{2} \)

In this expression, quantities \( M_{j}=n-\sum _{k=1}^{j-1}\, n_{k} \) and \( q_{j}=p_{j}\div \left( 1-\sum _{k=1}^{j-1}\, p_{k}\right) \) are to be underlined since they have a effective meaning. Obtaining a multiset \( \left( n_{0},\, n_{1},\, \cdots ,\, n_{\nu }\right) \) according to the multinomial law, i.e. with probability \( n!\, \prod \, \left( p_{j}^{n_{j}}\, /\, n_{j}!\right) \) can be done by successive binomial trials. At the begining, \( M_{1}\doteq n \) decisions are to be taken, and the probability to go to state \( 1 \) is \( q_{1}\doteq p_{1} \). Therefore, \( n_{1} \) can be sampled according to the binomial law \( \left( M_{1},\, q_{1}\right) \). After what, \( n_{1} \) "decisions" have been taken and \( M_{2}=n-n_{1} \) "lacks of decision" are to be replayed. Since state \( 1 \) is now exclued, the probability of state \( 2 \) shifts to \( q_{2}=p_{2}/\left( 1-p_{1}\right) \), and \( n_{2} \) can be to sampled according to the binomial law \( \left( M_{2},\, q_{2}\right) \). And so on, leading to :

\begin{displaymath}
Pearson's\, \chi ^{2}=\sum _{i=1}^{i=\nu }\frac{M_{i}}{\math...
...um _{k<i}n_{k}\: ;\: q_{i}=\frac{p_{i}}{1-\sum _{k<i}\, p_{k}}
\end{displaymath} (6)

When \( M_{i}\neq 0 \), the quantity \( z_{i} \) defined by :

\begin{displaymath}
z_{i}\doteq \frac{n_{i}-M_{i}\, q_{i}}{\sqrt{M_{i}q_{i}\left( 1-q_{i}\right) }}\end{displaymath}

is nothing but the reduced variable associated with the variable \( n_{j} \) since \( M_{i}\, q_{i}=\mathrm{E}\left( n_{i} \right) \) and \( M_{i}q_{i}\left( 1-q_{i}\right) =\mathrm{var}\left( n_{i} \right) \) when \( n_{i} \) is sampled accoding to the \( \left( M_{i},\, q_{i}\right) \) binomial game. If a vanishing \( M_{i} \) is encountered, one as \( n=n_{1}+\cdots +n_{i-1} \) and therefore \( n_{i}=\cdots =n_{\nu }=0 \) : the corresponding term in EQ. 6 is null together with the first fraction, while \( z_{i} \) remains undefined. In any cases, this equation can be rewritten as
\begin{displaymath}
\sum _{i=1}^{i=\nu }\frac{M_{i}}{\mathrm{E}\left( M_{i} \right) }z_{i}^{2}
\end{displaymath} (7)


A..4 Expectation and variance

The exact result

\begin{displaymath}
\mathrm{E}\left( Pearson's\, \chi ^{2} \right) =\nu \end{displaymath}

(independent of the \( p_{i} \)'s !) can be obtained in many ways. In the \( \nu +1 \) terms sum 1, the expectation of \( \left( n_{i}-n\, p_{i}\right) ^{2} \) is \( n\, p_{i}\left( 1-p_{i}\right) \) and, obviously, \( \sum _{0}^{\nu }\left( 1-p_{i}\right) =\nu +1-\sum p_{i}=\nu \). In the \( \nu \) terms sum 6, the expectation of \( z_{i}^{2} \) is, by definition, \( 1 \) when \( M_{i}\neq 0 \) (and cancels otherwise) leading to the required result.

The following formula holds exactly :

\begin{displaymath}
\mathrm{var}\left( Pearson's\, \chi ^{2} \right) =2\nu +\fra...
...( \nu +2\right) ^{2}+\sum _{i=0}^{\nu }\frac{1}{p_{i}}\right)
\end{displaymath} (8)


previous up next contents
Previous: 4 Fair die rolling Up: Revisiting the test Next: Bibliography   Contents


douillet@ensait.fr
2002-10-01