previous up next contents
Previous: 1.2 Starting with an Up: 1. Asserting Next: 1.4 Effects of a   Contents

Subsections

1.3 Least Squares Method

1.3.1 Exact resolution of an affine equation

Definition 1.3.1   A linear equation is an equation whose set of solutions is stable by linear combinations (i.e. form a vector space).

Example 1.3.2   Equation $ 3x+2y=0$ is linear. Indeed, the set of its solutions is $ \Delta\doteq\left\{ \left(-2t, 3t\right)\mid t\in\mathbb{C}\right\} $ i.e. a vector line.

Definition 1.3.3   An affine equation is an equation whose set of solutions is stable by barycentric combinations (i.e. form a barycentric space).

Example 1.3.4   Equation $ 3x+2x=1$ is affine. Indeed, the set of its solutions is $ \Delta\doteq\left\{ \left(1-2t, -1+3t\right)\mid t\in\mathbb{C}\right\} $ i.e. an affine line. It can be checked that $ m,  n\in\Delta$ and $ \mu\in\mathbb{R}$ implies $ \mu m+\left(1-\mu\right)n\in\Delta$.

Remark 1.3.5   Speaking about "equations without right hand member" is exactly as stupid as speaking about equations without an equal sign.

Scilab 1.3.6   Command a = b is used to encode an assignment (expression b is evaluated and the result is put into a box named a). On the other hand, command a == b is used to encode a test (like a<b or a~=b). Scilab not an algebraic computing tool and there is no way to encode an equation as an object by itself.

Maple 1.3.7   Command a := b is used to encode an assignment, command a=b is used to encode a substitution (subs, changevar, etc.) while the proper way to encode an equation $ a\left(x\right)=b\left(x\right)$ is eq:= b-a.

Proposition 1.3.8   An affine system of equations like :

$\displaystyle \left\{ \begin{array}{ccc}
3y+2z & = & 1\\
2y+z & = & 2\end{array}\right.$

may be written using matrices, i.e. :

$\displaystyle A  .  x=b\qquad\mathrm{where}\quad A=\left(\begin{array}{cc}
3 ...
...}
y\\
z\end{array}\right),  b=\left(\begin{array}{c}
1\\
2\end{array}\right)$

Requiring existence and unicity for solution $ f$ is equivalent to requiring inversibility for matrix $ A$ (Cramer's rule).

Remark 1.3.9   The unusual notation $ f$ for the unknown vector comes from the fact that, later, an unknown function will be described as an unknown linear form $ f$.

Scilab 1.3.10   Scilab makes no distinction between vectors and matrices : anything is a matrix and $ x\left(1,1\right)=3$ when $ x=3$. Nevertheless any matrix can be accessed as the column of its column vectors. Thus mm=[11,12,13;21,22,23], [li,co]=size(mm) leads to :

$\displaystyle mm\:\triangleleft \begin{array}{ccc}
11 & 12 & 13\\
21 & 22 & 23\end{array},\quad li\:\triangleleft 2,\quad co\:\triangleleft 3$

Exercise 1.3.11   Give the formulas relating $ i,  j,  k$ such that mm(j,k) and mm(i) describe the same element of matrix mm.

Maple 1.3.12   In this lecture, the "new release" matrices are used :

With (LinearAlgebra);

Ma:= Matrix([[3.2] [2.1]]); mf:= Vector([y, z]); ...

Mf = (1/ma).Mb;

1.3.2 Overdetermined systems

Definition 1.3.13   An overdetermined system is a system where there are more equations than unknowns.

Remark 1.3.14   In such a case, it can be expected that equation $ A  .  f=b$ has no solution. When the system is obtained by a set of experimental measurements, it is essential to take into account the uncertainties and the semantics of the problem turns into obtaining the best fit for $ A  .  f\approx b$.

Example 1.3.15   System $ 3y+2z=1, 2y+z=2,  y+z=1$ has no solutions : the first two equations imply $ y=3$ and $ z=-4$ and a substitution into the last equation will lead to $ -1=+1$. Most of the time, however, an "exact equality" is not what is required, but only a rough one. What is wanted is :

$\displaystyle \left\{ \begin{array}{ccc}
3y+2z & \approx & 1\\
2y+z & \approx ...
...& 1\end{array}\right),\; B=\left(\begin{array}{c}
1\\
2\\
1\end{array}\right)$

Definition 1.3.16   The least squares method replaces $ \left(A  .  f^{*}-b\right)\approx\overrightarrow{0}$ by

$\displaystyle f^{*}=\arg\min\left(A  .  f-b\right)^{2}$

where "square" is the Pythagorean one, i.e. the sum of squares of the coordinates.

Scilab 1.3.17   The sum of squares of a column vector V is achieved by a scalar product, a line by a column. In other words :
deff('c=Pythagoras(a)', 'c= a''*a')

Maple 1.3.18   In the same vein, the Maple sum of squares is :
Pythagoras: = v -> Transpose (v). v;

Theorem 1.3.19 (Least Squares)   The least squares equation related to $ A  .  x\approx b$ is : $ \raisebox{0.5 em}{\normalfont\textsf{t}}{A}  .  A  .  f^{*}=\raisebox{0.5 em}{\normalfont\textsf{t}}{A}  .  b$ whose solution is :

$\displaystyle f^{*}=S^{-1}  .  \left(\raisebox{0.5 em}{\normalfont\textsf{t}}...
...d\mathrm{where}\quad S\doteq\raisebox{0.5 em}{\normalfont\textsf{t}}{A}  .  A$ (1.1)

Proof. The goal is to find $ f^{*}$ that minimizes the Pythagorean square of column $ A  .  f-b$. We set :
$\displaystyle \chi^{2}\left(f\right)$ $\displaystyle \doteq$ $\displaystyle \frac{1}{n} \raisebox{0.5 em}{\normalfont\textsf{t}}{\},\!\left(A  .  f-b\right)  .  \left(A  .  f-b\right)$ (1.2)
$\displaystyle f^{*}$ $\displaystyle =$ $\displaystyle \arg \min \chi^{2}\left(f\right)$  

Expanding and using the fact that scalar $ \raisebox{0.5 em}{\normalfont\textsf{t}}{b}  .  A  .  f$ equals its transpose, we obtain :

$\displaystyle n \chi^{2}\left(f\right)$ $\displaystyle \doteq\raisebox{0.5 em}{\normalfont\textsf{t}}{f}  .  \raisebox...
...x{0.5 em}{\normalfont\textsf{t}}{b}  .  A  .  f+\left\vert b\right\vert^{2}$    
$\displaystyle n\left(\chi^{2}\left(f^{*}+\delta f\right)-\chi^{2}\left(f^{*}\right)\right)$ $\displaystyle =2 \raisebox{0.5 em}{\normalfont\textsf{t}}{\delta} f  .  \rai...
...m}{\normalfont\textsf{t}}{A}  .  b+\left\vert A  .  \delta f\right\vert^{2}$    
$\displaystyle n \delta\chi^{2}$ $\displaystyle =2 \raisebox{0.5 em}{\normalfont\textsf{t}}{\delta} f  .  \lef...
...malfont\textsf{t}}{A}  .  b\right)+\left\vert A  .  \delta f\right\vert^{2}$    

The last quantity should be positive for all $ \delta f$ and therefore shouldn't change its sign when replacing $ \delta f$ by $ -\delta f$. For quite-vanishing values of $ \delta f$, this implies that $ \left(\cdots\right)$ has to be orthogonal to all directions. But only the null vector can do that. $ \qedsymbol$

Remark 1.3.20   For a simply determined system, equation $ A  .  f=b$ implies obviously $ \raisebox{0.5 em}{\normalfont\textsf{t}}{A}  .  A  .  f=\raisebox{0.5 em}{\normalfont\textsf{t}}{A}  .  b$. The meaning of the former theorem is different : it translates the approximation ( $ A  .  f\approx b$) into an equation.

1.3.3 Measuring the quality of a model

Definition 1.3.21   The minimal value of $ \chi^{2}$ is called the residual variance and is denoted as $ s_{res}^{2}$. It is the average value of the square discrepancy between the actually measured values and the values predicted by the model for the same tests. Calling $ s^{2}$ the variance of the experimental values, the experimental variance reduction factor is defined as the quotient :

$\displaystyle \mathrm{VRF}_{exp}=\frac{s^{2}}{s_{res}^{2}}$

Proposition 1.3.22   For any system, the following relation holds :

$\displaystyle \mathrm{VRF}_{exp}\geq1$

Example 1.3.23   Here, matrix $ S=\raisebox{0.5 em}{\normalfont\textsf{t}}{A}  .  A$ is a $ 12\times12$ invertible matrix and :

$\displaystyle \raisebox{0.5 em}{\normalfont\textsf{t}}{x}^{*}=\left(55.52,\quad...
...96,\quad-1.42, +1.33,\quad4.31, -3.69, -4.69,\quad2.81, -0.94, 0.56\right)$

The target value (temperature) varies between $ 42°C$ and $ 66°C$, with standard deviation $ \sigma\approx6°C$ ( $ \sigma^{2}\approx36.12$). The differences between the experimental values and the values recalculated from the model are ranging from $ -3°C$ to $ +3°C$ with standard deviation $ \sigma_{res}\approx1.6°C$ ( $ \sigma_{res}^{2}\approx2.64$). The $ \mathrm{VRF}$ reaches therefore the interesting value of $ 13.68$, i.e. roughly dividing the amplitude of the target by factor $ 4$.

The thickness of the model can also be seen on FIG. 1.3 which plots discrepancies versus experimental values.

FIG. 1.3: Remaining discrepancies.
\includegraphics[height=50mm,keepaspectratio]{figures/usxper_residus-sav}

Scilab 1.3.24   FIG. 1.3 has been drawn using command draw_residus, LISTING [*].

Proposition 1.3.25   For a least square model, the best fit hyperplane ever goes through the average point, so that the average discrepancy ever vanishes.

Scilab 1.3.26   Command size gives the sizes of a two-dimensional matrix. Modifiers 'c', 'r' and '*' can be used to obtain the number of columns, the number of lines or the total number of elements of the array. Therefore, the variance of a centered vector V (i.e. a zero mean vector) is obtained by V'*V/size(V,'*')

Scilab 1.3.27   The Scilab command variance doesn't give the variance, but the best estimation of the overall variance (aka the variance of the whole population $ \Omega$) based on what has been recorded for the given sample.

Remark 1.3.28   The least squares method is sensitive to "outliers" where experiment gives a result significantly deviating from the model. It is worth repeating this experiment in order to either eliminate an experimental error, or confirm a strongly significant result.

Remark 1.3.29   It should be kept in mind that the semantic of the undertaken computations is not to minimize the discrepancy over $ \omega$ but over $ \Omega\setminus\omega$. Therefore, it is useless to arrange a model so that it exactly goes through all the experimental points. Proceeding that way leads almost always to a highly unstable (oscillating) model, that will provide meaningless values over the interesting places, i.e. over $ \Omega\setminus\omega$.

Definition 1.3.30   The expected $ \mathrm{VRF}$ is the ratio of our best estimates $ S^{2}$ and $ S_{res}^{2}$ of quantities $ s^{2}$ and $ s_{res}^{2}$ that should occur when dealing with "all" the replications of the given DOE.

$\displaystyle \mathrm{VRF}_{th}=\frac{S^{2}}{S_{res}^{2}}$

Proposition 1.3.31   If discrepancies $ T_{th}-T$ are random and identically distributed, this quotient can be estimated by :

$\displaystyle est\left(\mathrm{VRF}_{th}\right)=\frac{s^{2}\times n/\left(n-1\r...
...}{s_{res}^{2}\times n/\left(n-p\right)}=\mathrm{VRF}_{exp}\times\frac{n-p}{n-1}$

Where $ n,  p$ are the dimensions of matrix $ A$ (number of tests and number of coefficients).

Proof. Factor $ n-1$ is established in many places, http://www.douillet.info/~douillet/cours/decis/decis.html among them. The other factor can be established in the same way, by expressing $ \chi^{2}$ as a sum of $ n-p$ independent squares. $ \qedsymbol$

Remark 1.3.32   The expectation of a quotient is not the quotient of expectations, and the above formula can lead to values below $ 1$ for the estimated $ \mathrm{VRF}$. What nevertheless remains is the following fact : if we use all the $ n$ elements of a sample to obtain a model with $ n$ parameters, this model is only modeling the error term and its predictive value concerning all the other members of the population is nothing but zero.


previous up next contents
Previous: 1.2 Starting with an Up: 1. Asserting Next: 1.4 Effects of a   Contents


douillet@ensait.fr
2008-03-14