previous up next contents
Previous: 3.1 Importance of assumptions Up: 3. Improving Next: 3.3 Response surfaces   Contents

Subsections

3.2 Computing uncertainties

3.2.1 Statement about uncertainties

Computing an affine regression line is equivalent to playing the following game. Alice knows the values of two secret coefficients $ \alpha, \beta$ and Bob doesn't. Alice discards some pairs of numbers $ \left(x_{i},  y_{i}\right)$. The $ x_{i}$ are exactly defined and the $ y_{i}$ are (secretely) computed as $ y_{i}=\alpha  x_{i}+\beta+\nu_{i}$ where the $ \nu_{i}$ are independent identically distributed random variables, with mean 0 and variance $ \sigma_{\nu}^{2}$.

What Bob must do is finding an estimate of $ \alpha, \beta$ in the form of a confidence interval around a "most probable value". The well known solution is to determine the pair $ \left(a,  b\right)$ that minimizes :

$\displaystyle \chi^{2}\doteq\frac{1}{n} \sum_{1}^{n}\left(y_{i}-a  x_{i}-b\right)^{2}$

Using notations of Theorem 1.3.19, we have :

$\displaystyle \raisebox{0.5 em}{\normalfont\textsf{t}}{A}=\left(\begin{array}{c...
...\end{array}\right)\quad;\quad f=\left(\begin{array}{c}
b\\
a\end{array}\right)$

and therefore :

$\displaystyle S\doteq\raisebox{0.5 em}{\normalfont\textsf{t}}{A}  .  A=\frac{...
...{array}\right)\quad;\quad a=\frac{cov}{var_{x}}\quad;\quad b=\bar{y}-a \bar{x}$    
$\displaystyle \chi_{min}^{2}=var_{y}\left(1-\frac{cov_{xy}^{2}}{var_{x}  var_{y}}\right)$    

The key point here is the assumption that the reality (not alone the model) is affine, and the errors are iid. In all what follows the same assumptions are ever done : the measures are the sum of an iid random term and a linear combination of exactly known terms (whatever complicated and non linear these terms can be relative to the exactly known $ x_{i}$ variables).

3.2.2 Uncertainties

Theorem 3.2.1   For a design of $ n$ measure points addressing a model with $ p$ coefficients ($ n\geq p$ is obviously assumed), the residual variance $ \chi_{min}^{2}$ is a biased (underestimated) estimate of $ \sigma_{\nu}^{2}$. An unbiased estimator is obtained by taking into account the actual number of degrees of freedom and we have :

$\displaystyle est\left(\sigma_{\nu}^{2}\right)=\frac{n}{n-p} \chi_{min}^{2}$

Proof. $ \chi_{min}^{2}$ is a quadratic form over the $ \nu_{i}$ and it reduces, by the very definitions, into a quadratic form with $ n-p$ terms. In the special case where $ n=p$, this formula is "more true than ever" : this $ 0/0$ indeterminate result describes exactly what is known about the uncertainties : nothing. $ \qedsymbol$

Proposition 3.2.2   When dealing with an regression line, coefficient $ a$ is an affine function of the $ \nu_{i}$. Similarily, for a fixed $ X\in\mathbb{R}$, the quantity $ Y\doteq a  X+b$ is an affine function of the $ \nu_{i}$ and we have :

$\displaystyle \sigma_{a}^{2}=\sigma_{\nu}^{2}\times\frac{1}{n} \frac{1}{var_{x}}\approx\chi_{min}^{2}\times\frac{1}{n-2} \frac{1}{var_{x}}$

$\displaystyle \sigma_{Y}^{2}=\sigma_{\nu}^{2}\times\frac{1}{n}\left(1+\frac{\le...
...}\times\frac{1}{n-2} \left(1+\frac{\left(X-\bar{x}\right)^{2}}{var_{x}}\right)$

In these formulas, $ \sigma_{a}^{2}$ et $ \sigma_{Y}^{2}$ are variances of $ a$ and $ Y$ over a "quite infinite number of repetitions" of the procedure (and the existence of a finite fourth moment is assumed for $ \nu$ in order to enforce convergence).

Theorem 3.2.3   In the general case, the variance of the estimate $ Y=X  .  f^{*}$ computed at point $ X$ using the least squares formula is :

$\displaystyle \sigma_{Y}^{2}=X  .  S^{-1}  .  \raisebox{0.5 em}{\normalfont\textsf{t}}{X}$

Proof. From the assumptions done, the coefficients $ f^{*}$ computed according to (1.1), and $ Y$ itself are affine functions of the iid variables $ \nu_{i}$. Therefore :

$\displaystyle \sigma_{Y}^{2}=\sum\left(\frac{\partial  Y}{\partial  v_{i}}\right)^{2}\sigma_{\nu}^{2}$

and it is immediate that column $ \partial Y/\partial\nu_{i}$ equals $ X  .  S^{-1}  .  \raisebox{0.5 em}{\normalfont\textsf{t}}{A}$. $ \qedsymbol$

Remark 3.2.4   The preceding results were obtained without any hypothesis on the distribution of the discrepancies, apart from being centered iid random variables. However, it is essential that the $ X$ are exactly known.

Remark 3.2.5   Assuming further that noise is normal, it can be seen that quantity $ est\left(\sigma_{\nu}^{2}\right)/\sigma_{\nu}^{2}$ follows a Student-Fischer law with $ n-p$ degrees of freedom. This allows the computation of confidence intervals when $ n-p$ is small.


previous up next contents
Previous: 3.1 Importance of assumptions Up: 3. Improving Next: 3.3 Response surfaces   Contents


douillet@ensait.fr
2008-03-14