Previous: 4 EXPERIMENTING
Up: SAMPLING DISTRIBUTION OF THE
Next: 6 USEFUL AND USELESS
Subsections
5 ESTIMATING THE VARIATIONS OF THE SAMPLE VARIANCE
In many situations, estimating the variance of the population is not
only a step for building confidence intervals around the mean, but
is significant by itself. For example, an industrial product cannot
be used for task requiring some precision when the adequate test shows
a large variance. On the other hand, enforcing a useless precision
will only induce additional costs. In production, a sudden increase
in variability may indicate the appearance of a production fault citeseppen-1000cite##1##2(##1@tempswa , ##2)##1##2##3##1 ##3internalcitecrow:stats60.
In such situations, confidence intervals around the variance are to
be discussed. When the moments of the population are not known a
priori, formulas like (4) cannot be used,
and must be replaced by formulas using the moments of the sample.
Therefore, a method is needed to compute the expectation of these
moments and of their products.
Definition 5.1
The

-degree of a monomial

is

i.e. the number of factors

,
distinct or not, occurring in

, while the

- degree of
the same monomial is

i.e. the
number of factors

, distinct or not, occurring in the expansion
of

into a polynomial in the

. Accordingly,

and

are defined for a monomial

depending on the population
moments.
In order to obtain an unbiased statistic for a monomial
,
we cannot simply substitute each
by its unbiased statistic.
This is obvious for terms like
, but in fact this ever
occurs since the moments of the sample are not (fully) independent
when considered as random variates. We have to consider all the monomials
such that
. After having computed
each
, we can eliminate the irrelevant monomials
in the
in order to specifically isolate
.
Many research papers have been devoted to the computation of the
,
citeseppen-1000cite##1##2(##1@tempswa , ##2)##1##2##3##1 ##3internalcitetracy65:moments among them. These computations have been completely
transformed by the formal computing tools that nowadays are largely
accessible. To quote the Knuth's foreword to citeseppen-1000cite##1##2(##1@tempswa , ##2)##1##2##3##1 ##3internalciteAeqB :
Science is what we understand well enough to explain to a computer.
Art is everything else we do. During the past several years an important
part of mathematics has been transformed from an Art to a Science.
No longer do we need to get a brilliant insight in order to evaluate
sums of binomial coefficients, and many similar formulas that arise
frequently in practice. We can now follow a mechanical procedure and
discover the answers quite systematically.
Proof.
The rule concerning the degrees is obvious. The rationality of the

comes from the binomial coefficients,
and the specific value of the denominators in closed form comes from
products of powers of

(from the definition of the sample mean

) by powers of

(from the definition of
the

).
Algorithm 5.3 (Newton)
In what follows

denotes the

-th element of a list named

. Let

contains the values taken by a polynomial

at some prescribed
abscissas

. In other words,

.
In order to determine

,

is assumed. For increasing

, compute the divided differences

as defined by :
Then

Even automated, these computations are prone to errors and typos.
For example, in page 208 of citeseppen-1000cite##1##2(##1@tempswa , ##2)##1##2##3##1 ##3internalcitefisher:moments29, we should have
in the
formula instead of
. While using his algorithm to compute
cumulants by identifications, citeseppen-1000cite##1##2(##1@tempswa , ##2)##1##2##3##1 ##3internalcitegood77:kstats has detected a
typo in citeseppen-1000cite##1##2(##1@tempswa , ##2)##1##2##3##1 ##3internalciteud-din54, while this article itself was signaling
a typo from another author. An efficient test of correctness is the
following :
Proposition 5.4
For a given degree
, the determinant of all the
over the basis of all the
of the same degree splits into
linear factors, namely
and
in the denominator
and
where
in the numerator.
For example, when
, we have :
 |
(9) |
Proof.
Clearly, the denominator splits into powers of

and

. To
prove the form of the numerator, let us consider

. When
expressing a given

in the vector space spanned by the

,
the l.c.m. of the involved denominators is necessarily

since (1) the degree of this polynomial has the maximal value and
(2) each

is required since the

cannot
form a basis when

. When computing the determinant
of the

over the

, this denominator
is elevated to power

: any exponent in the numerator cannot
exceed this value (in fact, they form a decreasing sequence).
Remark 5.5
It appears that no factors

are canceling. When

this leads to power :
At the same time, many powers of

are canceling, reducing the
total degree of denominator from

(

denominators, each
of ninth degree) to only

.
2 Some Results
As a direct application of this algorithm, we have the following results.
Proof.
The value of

is given by (
4).
It's total degree is

. It can be seen citeseppen-1000cite##1##2(##1@tempswa , ##2)##1##2##3##1 ##3internalcitefisher:moments29
that :
 |
(11) |
The result follows by elimination. The

denominator "comes" from the fact that an undetermined
expression is needed when

and

since, for these
values, we have either

or

, reducing the dimension
of the vector space.
Remark
When

is Gaussian, all cumulants are 0 except from

and this formula reduces to

, as it should
be, since

is

distributed in this special
case. In the general case, the distribution of

is necessarily
skew, and (
12) shows that the asymptotic
skewness is ever at most

.
Proposition 5.8
The variance of estimator
is :
Remark
According to citeseppen-1000cite##1##2(##1@tempswa , ##2)##1##2##3##1 ##3internalcitefisher:moments29, a better looking expression
is obtained by transcoding moments into cumulants. But simplification
is only on rational, exactly known coefficients. The underlying complexity,
due to such a number of quite canceling terms remains the same : in
the cumulants formula, all signs are positive, but the cumulants themselves
aren't necessarily positive (even the cumulants of even index).
Previous: 4 EXPERIMENTING
Up: SAMPLING DISTRIBUTION OF THE
Next: 6 USEFUL AND USELESS
douillet@ensait.fr
2009-09-09