previous up next
Previous: 3 RESULTS IN CLOSED Up: SAMPLING DISTRIBUTION OF THE Next: 5 ESTIMATING THE VARIATIONS

Subsections


4 EXPERIMENTING

This Section is devoted to some experimental results. To allow some comparisons, we start by a Gaussian example.

1 Normal Distribution

We have simulated $ N=200000$ samples from a Gaussian distribution, with sample size $ n=8$ and parameters $ \mu =0,$ $ \sigma =7$. In Figure 2, we have plotted the experimental histogram of the sample variance (circles) together with the theoretical $ \chi_{7}^{2}$ (scaled) distribution (solid lines). The goodness of fit, as measured by $ \chi_{Pearson}^{2}=25.10$, i.e. $ \chi_{std}^{2}=-1.28$ is excellent. A Gaussian curve, even with the required parameters, would not be the right model (dotted line) since $ n=8$ is far from infinity. Additionally, the experimental skewness of $ s^{2}$ is $ \gamma_{1}\approx1.07$, i.e. very close to the theoretical value, $ \sqrt{8/7}$.

Figure 2: Normal law, $ n=8$
% latex2html id marker 5173
\includegraphics[height=52mm]{figures/xfig_nor_8}

2 Uniform Distribution

When $ \xi $ is the discrete uniform distribution over the integer range $ -a\leq\xi\leq+a$, the distribution of $ m_{2}$ remains coarse, whatever the size $ N$ of the simulation. From $ m_{2}\leq n\, a^{2}/\left(n-1\right)$ together with $ \left(n^{2}-n\right)m_{2}\in\mathbb{Z}$, no more than $ \left(a\,n\right)^{2}$ different values of $ m_{2}$ can occur. Moreover, this upper bound is not tight : when $ a=10$ and $ n=5$ the actual number of occurring values is $ 617$, not $ 2500$. The fact that not every integer is a square modulo $ n^{2}-n$ is one of the reasons of this drastic reduction. As a result, a batch involving $ N=200000$ samples leads to a very coarse distribution, as shown in Figure 3(a).

Figure: Discrete uniform distribution in $ \left[-10,\,+10\right]$ (sample size $ n=5$)
[showing coarsity]% latex2html id marker 5210
\includegraphics[width=0.5\textwidth,height=52mm]{figures/xfig_uin_5_bars}[hidding coarsity]% latex2html id marker 5211
\includegraphics[width=0.5\textwidth,height=52mm]{figures/xfig_uin_5}

The corresponding histogram remains "rugged" as shown in Figure 3(b). Moreover, it appears that neither the normalized $ \chi_{n-1}^{2}$ (dotted line) nor the adapted normal curve (solid line) provides even a rough approximation of the distribution.

Using a continuous uniform distribution leads to better looking experimental curves as shown in Figure 4 (here again, $ a=10$). But the departure from $ \chi^{2}$ remains in Figure 4(a) where $ n=5$ while a quite normal curve is obtained in Figure 4(b) where $ n=8$.

Figure: Continuous uniform distribution in $ \left[-10,\,+10\right]$
[$ n=5$]% latex2html id marker 5235
\includegraphics[width=0.5\textwidth,height=52mm]{figures/xfig_unf_5}[$ n=8$]% latex2html id marker 5238
\includegraphics[width=0.5\textwidth,height=52mm]{figures/xfig_unf_8}

Proposition 4.1   When $ \xi $ is a (continuous) uniform random variable in $ \left[-a,\,+a\right]$ then $ \mu_{2}=a^{2}/3$, $ \mu_{4}=a^{4}/5$. The resulting scaled squared coefficient of variation -to be compared with (6)- is :

$\displaystyle sscv_{unif}=\frac{var_{\Phi }\left(m_{2}\right)}{\mu_{2}}\times\frac{n-1}{\mu_{2}}=\frac{4\, n+6}{5\, n}$ (7)

Observed skewness are $ \gamma_{1}\approx0.40$ for Figure 4(a) where $ n=5$ and $ \gamma_{1}\approx0.27$ for Figure 4(b) where $ n=8$, far less than corresponding values for $ \chi_{n-1}^{2}$ that are respectively $ \gamma_{1}\approx1.41$ and $ \gamma_{1}\approx1.07$. More results concerning skewness are given in Proposition 5.7.

Many statistics tend to be normally distributed as the data from which they are calculated are increased indefinitely; and this I suggest is the genuine reason for the importance which is universally attached to the normal curve citeseppen-1000cite##1##2(##1@tempswa , ##2)##1##2##3##1 ##3internalcitefisher:toronto24.

3 Lognormal Distribution

Proposition 4.2   When $ \xi $ is lognormal, let us define parameters $ M,\,K$ by $ \ln M=E\left(\ln\xi\right)$ and $ \ln K=var\left(\ln\xi\right)$. Then $ \mu =M\sqrt{K}$, $ \mu_{2}=M^{2}K\left(K-1\right)$. Moreover, $ \mu_{4}=M^{4}K^{2}\left(K-1\right)^{2}\left(K^{4}+2K^{3}+3K^{2}-3\right)$. The resulting scaled squared coefficient of variation -to be compared with (6)- is :

$\displaystyle sscv_{logn}=\frac{var_{\Phi }\left(m_{2}\right)}{\mu_{2}}\times\frac{n-1}{\mu_{2}}=2+\frac{n-1}{n}\left(K-1\right)\left(K^{3}+3K^{2}+6K+6\right)$ (8)

Figure 5(a) has been drawn with $ M=7$, $ K=2$ and $ n=8$. Since ratio (8) is around $ 40$, the observed skewness is huge ( $ \gamma_{1}\approx39$) leading to a curve that differs totally from either Gaussian or $ \chi^{2}$. On the contrary, as shown in Figure 5(b), a $ \log$ scale gives a curve with $ \gamma_{1}\approx0.05$ that fits really well with a Gaussian.

Figure 5: Lognormal distribution
[flat]% latex2html id marker 5313
\includegraphics[width=0.5\textwidth,height=48mm]{figures/xfig_log_8}[log]% latex2html id marker 5314
\includegraphics[width=0.5\textwidth,height=48mm]{figures/xfig_log_8_log}

4 Student's Like Distributions

Well Known Result 4.3 (citeseppen-1000cite##1##2##1@tempswa , ##2##1##2##3##1 ##3internalcitestudent:error-mean)   For a sample drawn at random from a normal population, the statistic $ t$ defined by :

$\displaystyle t=\frac{m-\mu }{s}$

is distributed according to the Student-Fischer $ pd\! f$ :

$\displaystyle \left(1+\frac{t^{2}}{\nu}\right)^{-\left(\nu+1\right)/2}\,\frac{\...
...left(\frac{\nu+1}{2}\right)}{\sqrt{\pi\,\nu}\,\Gamma\left(\frac{\nu}{2}\right)}$

In order to see what happens when $ \varphi $ is not Gaussian, we have drawn the histograms of statistic $ t$ corresponding to Figure 4(a) (uniform, $ n=5$) and Figure 5(a) (lognormal, $ n=8$). In Figure 6(a), it can be seen that the tail of the experimental curve isn't very different from the corresponding Student-Fisher curve with $ \nu=4$ degrees of freedom. On the contrary, Figure 6(b) shows a very skew distribution, far different from the two tentative models.

In fact, the most surprising curve is the quite Gaussian curve associated with the uniform distribution. This can be related with the following fact. The intersection of hyperplane $ m=constant$ and the hypercube $ \Phi $ is an hyper-polygon. The more $ m$ is away from $ \mu =0$, the more this hyper-polygon shortens, leading to small values of $ s$. Conversely, $ E\left(s\mid m=0\right)$ is as large as possible.

Figure 6: Distribution of t-like statistics
[From an uniform population]% latex2html id marker 5354
\includegraphics[width=0.5\textwidth,height=50mm]{figures/xfig_unf_5_score}[From a lognormal population]% latex2html id marker 5355
\includegraphics[width=0.5\textwidth,height=50mm]{figures/xfig_log_8_score}


previous up next
Previous: 3 RESULTS IN CLOSED Up: SAMPLING DISTRIBUTION OF THE Next: 5 ESTIMATING THE VARIATIONS


douillet@ensait.fr
2009-09-09