previous up next
Previous: ABSTRACT Up: SAMPLING DISTRIBUTION OF THE Next: 2 NOTATIONS

1 INTRODUCTION

Modeling is translating reality into formulas, thereafter acting on the formulas and finally translating the results back to reality. Obviously, the model has to be tractable in order to be useful. But too often, the extra hypotheses that are assumed to ensure tractability are held as rock-solid properties of the real world. It must be recalled that "everyday life" is not only made with "every day events" : rare events are rarely occurring, but they do.

For example, modeling a bell shaped histogram of experimental frequencies by a Gaussian $ pd\! f$ (probability density function) or a Fisher's $ pd\! f$ with four parameters is usual. Thereafter transforming this $ pd\! f$ into a $ mg\! f$ (moment generating function) by $ mg\! f\left(z\right)=E_{t}\left(\exp z\, t\right)$ is a powerful tool to obtain (and prove) the properties of the modeling $ pd\! f$. But this doesn't imply that a specific moment (e.g. $ \mu_{4}$) is effectively an accessible experimental reality.

This fact contains but is not limited to situations where these moments are infinite or undefined. For example, it is well known citeseppen-1000cite##1##2(##1@tempswa , ##2)##1##2##3##1 ##3internalcitebrown07 that the ratio of two standardized Gaussian variables is distributed according to a Cauchy $ pd\! f$, so that the first moment exists only in principal value and the second moment is infinite. In fact, the mere difficulty occurs when these moments exists (this will be our hypothesis throughout the paper).

Moments of increasing index are increasingly dependent on the tails of the probability distribution, i.e are depending on increasingly rarer events and therefore are less and less accessible to experiment. Moreover, formulas that have to be used to evaluate these moments are increasingly complex and contain an increasing number of quite canceling terms, so that computation is unstable and propagates amplified uncertainties. This is even true for the simple "sample variance", that is our best guess of the "true" variance of the whole population.

The aim of this paper is to collect and illustrate some facts concerning this problem. The "Well Known Results" will be stated as such, while Theorem/Proposition will be reserved to new results or, at least, to results that are not usually emphasized. In Section 3, closed form results will be obtained for the very special situations when the sample size is either $ 2$ or $ 3$. It will be seen that even in this seemingly simple situation, general results are not easy to obtain.

In the remaining Sections, it will ever be assumed that samples contains at least four elements. Section 4 gives some experimental evidences, obtained using batches of $ N=200000$ independent samples. This value has been chosen in order to ensure "well shaped" curves... when such curves exist. It will be seen that these curves are often far away of the models generally used.

In Section 5, an algorithm is given that uses formal computing to re-obtain the formulas giving the best statistics for the moments of small index, and obtain these formulas and their Jacobian for $ n=11$ (new result). In Section 6 these formulas are used to determine the minimal size that a sample must have in order that a given statistic can be obtain from that sample. The paper ends with a concluding Section and some References.


previous up next
Previous: ABSTRACT Up: SAMPLING DISTRIBUTION OF THE Next: 2 NOTATIONS


douillet@ensait.fr
2009-09-09