Let us start by recalling that
statistic
is defined as
The underlying idea in this definition is the following. When the
number
of trials increase, the
statistic
doesn't really moves since the numerators are expected to grow like
(and not like
). On the contrary, this statistic
is expected to converge (when
) towards
a limit that would measure how far is the experimental law from its
theoretical model.
Let us exemplify this behaviour by taking
and
.
For a given
, there are
possible
multinoms, each having a probability equal to
.
Choosing
and tallying the resulting distribution of the
leads to the 9 bars histogram of fig: demo_chi2
(lightgrey in the foreground), while
leads to the other
histogram (darkgrey in the background).
On the other hand, the
, better written
,
is the pdf (probability density function) of the sum of
squared Gaussian variables. In other words :
These two "chisquares" are connected by a convergence
property : when
, the partition remaining
fixed, the pdf of the
statistic is expected to converge
towards the
. In other words : the
tends to become independent of the
's, depending only
from
. This result, and also formulae relative to mean
and variance of the
statistics are revisited in anx: xcs_chi2.
It will be restated that the quite metaphysical "degrees
of freedom" are nothing but the rank of a quadratic form.