When
is uniform over
, then
.
An unbiased statistic for this quantity is
where
is given in xtwnr 5.6. In order to estimate the
quality of this statistic, we have simulated four sets of
samples, using respectively
, and plotted the
results in Figure 7. In all cases the average of
is as expected (the dashed line). But only the greatest value of
gives a nice shaped curve. For smaller values of
the distribution
is really skew and for really small
, a noticeable part of the
experimental values of
are negative (
in Figure 7(a)
where
).
Situation described in Subsection 6.1 shows that an "unbiased statistic" can be absolutely useless when dealing with small samples. In order to explore this question, we have to specify a border value beyond which noise will be considered as louder than signal.
In order to provide a similar criterion when the
is not easy
to obtain, we have to select a threshold value for the coefficient
of variation. Our choice of
is based on the following reason.
Probability distributions can be built such that quite all of the
population lies inside of the "one sigma" range.
But, outside the class room, these distributions are describing situation
where rare events are a dominant feature, so that mean values have
no more a clear factual meaning.
For the other situations, a great part of the population lies outside
of the "one sigma" range. With our choice of factor
, this means that there is an important part of the population
outside of
when
the nominal value is
. Our feeling is that a not better known
statistic should be discarded in any situation. Obviously, another
choice of the factor, or a non symmetric interval (to take into account
the unavoidable skewness of a positive variable), would be possible.
But this would not change the mainlines of the argument.
| |
spec | |
|
|
spec
uniform |
It has to be noticed that
can be "useless"
even if
is "useful". This is partly related
to
and partly
related to the statistical nature of the involved quantities. Moreover,
using explicitly that a variable is Gaussian results into
so that a useful statistic for
as soon as
instead of
obtained by ignoring this relation.
For a chi-square distribution, a similar situation occurs. The only significant change is that border values are increasing.
We will now examine in details what happens when
is
known to be exponentially distributed. This is a very strong hypothesis
since it affirms that only one parameter is required to specify the
population. If we are really sure of the validity of this hypothesis,
we can lower the border of usability by an huge factor.
Statistic
is "useful" for estimating
as soon as
. Estimating
by
will be foolish
since a better statistic can be obtained via
. It can be
seen that :
Among
samples of size
drawn at random from an exponential
population, with
, the following values have been obtained.
When using
as statistic for
, then
values fall outside
, i.e. a proportion
of
(cf.
: it's a two-sigma interval
for a variate not so far from normality). When using
as statistic for
, then
values fall outside of
, i.e. a proportion of
,
and
values outside
, i.e. around
. When using
as statistic for
, then
values fall outside of
,
i.e. a proportion of
, and
values outside
,
i.e. around
.