Modeling is translating reality into formulas, thereafter acting on the formulas and finally translating the results back to reality. Obviously, the model has to be tractable in order to be useful. But too often, the extra hypotheses that are assumed to ensure tractability are held as rock-solid properties of the real world. It must be recalled that "everyday life" is not only made with "every day events" : rare events are rarely occurring, but they do.
For example, modeling a bell shaped histogram of experimental frequencies
by a Gaussian
(probability density function) or a Fisher's
with four parameters is usual. Thereafter transforming this
into a
(moment generating function) by
is a powerful tool to obtain (and prove) the properties of the modeling
. But this doesn't imply that a specific moment (e.g.
)
is effectively an accessible experimental reality.
This fact contains but is not limited to situations where these moments
are infinite or undefined. For example, it is well known citeseppen-1000cite##1##2(##1@tempswa , ##2)##1##2##3##1 ##3internalcitebrown07
that the ratio of two standardized Gaussian variables is distributed
according to a Cauchy
, so that the first moment exists only
in principal value and the second moment is infinite. In fact, the
mere difficulty occurs when these moments exists (this will be our
hypothesis throughout the paper).
Moments of increasing index are increasingly dependent on the tails of the probability distribution, i.e are depending on increasingly rarer events and therefore are less and less accessible to experiment. Moreover, formulas that have to be used to evaluate these moments are increasingly complex and contain an increasing number of quite canceling terms, so that computation is unstable and propagates amplified uncertainties. This is even true for the simple "sample variance", that is our best guess of the "true" variance of the whole population.
The aim of this paper is to collect and illustrate some facts concerning
this problem. The "Well Known Results" will be stated as
such, while Theorem/Proposition will be reserved to new results or,
at least, to results that are not usually emphasized. In Section 3,
closed form results will be obtained for the very special situations
when the sample size is either
or
. It will be seen that
even in this seemingly simple situation, general results are not easy
to obtain.
In the remaining Sections, it will ever be assumed that samples contains
at least four elements. Section 4 gives some
experimental evidences, obtained using batches of
independent
samples. This value has been chosen in order to ensure "well
shaped" curves... when such curves exist. It will be seen
that these curves are often far away of the models generally used.
In Section 5, an algorithm is given that
uses formal computing to re-obtain the formulas giving the best statistics
for the moments of small index, and obtain these formulas and their
Jacobian for
(new result). In Section 6
these formulas are used to determine the minimal size that a sample
must have in order that a given statistic can be obtain from that
sample. The paper ends with a concluding Section and some References.