previous up next contents
Previous: 1 Introduction Up: Revisiting the test Next: 3 Unfair die rolling   Contents


2 \( Pearson's\, \chi ^{2}\) and \( \chi ^{2}\, law\)

Let us start by recalling that \( Pearson's\, \chi ^{2}\) statistic is defined as

\begin{displaymath}
Pearson's\, \chi ^{2}\doteq \sum _{i=0}^{i=\nu }\frac{\left( n_{i}-n\, p_{i}\right) ^{2}}{n\, p_{i}}
\end{displaymath} (1)

in the following experimental context. A random variable \( \xi \) is given ; a no nonsense partition \( I_{0},\, \cdots I_{i},\, \cdots ,\, I_{\nu } \) of the \( \zeta \)'s range is chosen, i.e a partition such that \( \forall i\, :\, p_{i}\doteq Pr\left( \xi \in I_{i} \right) >0 \) ; and finally a number of trials \( n \) is fixed. Thereafter, \( n \) independent instanciations \( \xi _{1},\, \cdots \zeta _{j},\, \cdots ,\, \xi _{n} \) of \( \xi \) are sampled. The \( n_{i} \)'s ( \( 0\leq i\leq \nu \)) are the (experimental) number of visits received by the i-th subset during the experiment (i.e. \( Card\left\{ j\, \vert\, \zeta _{j}\in I_{i}\right\} \)). Obviously, \( n\, p_{i} \) is the expectation of \( n_{i} \) over the set of all \( n \)-sized experiments. It should be noticed that the multiset \( \left[ n_{0},\, n_{1},\, \cdots ,\, n_{\nu }\right] \) follows a multinomial law, the only influence of the initial law (i.e. the law of the \( \xi \)'s) being to determine the elementary probabilities \( p_{i}=\mathrm{E}\left( n_{i} \right) /n \) of this multinomial law.

The underlying idea in this definition is the following. When the number \( n \) of trials increase, the \( Pearson's\, \chi ^{2}\) statistic doesn't really moves since the numerators are expected to grow like \( n \) (and not like \( n^{2} \)). On the contrary, this statistic is expected to converge (when \( n\rightarrow \infty \)) towards a limit that would measure how far is the experimental law from its theoretical model.

FIG.  1: The \( X_{i}\) (grey) and their expectations (light grey).
\resizebox*{7.5cm}{5cm}{\includegraphics{figures/demo_chi2.eps}}

Let us exemplify this behaviour by taking \( \nu =5 \) and \( \displaystyle p=\left[ \frac{1}{5},\, \frac{1}{6},\, \frac{1}{7},\, \frac{1}{8},\, \frac{1}{9},\, \frac{641}{2520}\right] \). For a given \( n \), there are \( {n+\nu \choose \nu } \) possible multinoms, each having a probability equal to \( n!\, \prod \left( p_{i}^{n_{i}}/n_{i}!\right) \). Choosing \( n=4 \) and tallying the resulting distribution of the \( Pearson's\, \chi ^{2}\) leads to the 9 bars histogram of fig: demo_chi2 (lightgrey in the foreground), while \( n=7 \) leads to the other histogram (darkgrey in the background).

On the other hand, the \( \chi ^{2}\, law\), better written \( \chi ^{2}_{\nu }\left( \zeta \right) \), is the pdf (probability density function) of the sum of \( \nu \) squared Gaussian variables. In other words :

\begin{displaymath}
\chi ^{2}_{\nu }\left( \zeta \right) \quad pdf\, of\quad \zeta =z_{1}^{2}+z_{2}^{2}+\cdots +z_{\nu }^{2}
\end{displaymath} (2)

where \( \nu \) , the so-called "number of degrees of freedom", is a given positive integer and the \( z_{j} \) are independent normal variables with \( \mathrm{E}\left( z_{i} \right) =0 \) and \( \mathrm{var}\left( z_{i} \right) =1 \)). The well-known formulae \( \mathrm{E}\left( \zeta \right) =\nu \), \( \mathrm{var}\left( \zeta \right) =2\nu \) (cf. Annexe AA.2) can be used to define \( \zeta _{reduced} \) by :
\begin{displaymath}
\left( \chi ^{2}\right) _{reduced}\doteq \frac{\zeta -\nu }{\sqrt{2\, \nu }}
\end{displaymath} (3)

For greater values of \( \nu \), the variable \( \zeta _{reduced} \) behaves as if it were Gaussian, and for lower values, the skewness of the distribution is increasing. Nevertheless, it can be checked that the Gaussian rejection formula \( Pr\left( 2<\left\vert \zeta _{reduced}\right\vert \right) <5\% \) still holds for every \( \nu >1 \).

These two "chisquares" are connected by a convergence property : when \( n\rightarrow \infty \), the partition remaining fixed, the pdf of the \( Pearson's\, \chi ^{2}\) statistic is expected to converge towards the \( \chi ^{2}\, law\). In other words : the \( Pearson's\, \chi ^{2}\) tends to become independent of the \( p_{i} \)'s, depending only from \( \nu \). This result, and also formulae relative to mean and variance of the \( Pearson's\, \chi ^{2}\) statistics are revisited in anx: xcs_chi2. It will be restated that the quite metaphysical "degrees of freedom" are nothing but the rank of a quadratic form.


previous up next contents
Previous: 1 Introduction Up: Revisiting the test Next: 3 Unfair die rolling   Contents


douillet@ensait.fr
2002-10-01