previous up next_inactive
Up: Return to previous menu

Ensait - Design of Experiment

2007-11-13 evaluation
duration 2h00

All documents are allowed

Some Global Remarks

  1. The first quality that is expected from an engineer is the ability to submit his/her findings in a clear and scientific language. All listings, computations, charts and other "printer outputs" can in no way replace a statement of conclusions, coined in an accurate and scientific language.
  2. The examination was about three different models for a single round of tests. It was therefore necessary to coordinate different types of materials in order to produce an organized response. Different methods have been used. Doing screenshots and integrating them it into a word processor is fine.
  3. Doing cut and paste with scissors and glue (and traces of thumb) is EQUALLY fine... and is probably the fastest way. You can also incorporate comments into the listing of execution. You can also ... All methods are right. What turns wrong is the lack of a communication method.
  4. Some comments are "poorly written", with a huge number of orthographic/grammatical mistakes. However, a lack of mistakes due to a lack of comments is not what is required.

Some Other Guidelines

  1. Check the printers at the beginning of the evaluation, and print each piece as soon as possible. At exactly the specified time, printers will be disconnected.
  2. On any printed document, the FAMILY_NAME/Given_name of the student must appear (especially in the title of the figures).
  3. Students are advised that the network traffic of their computer is likely to be recorded during the evaluation.

1 Encoding a Design

Download the file located at: http://www.douillet.info/~douillet/cours/planx_ds07a/dat_planx_ds07a.txt. This file describes the design of an experiment. The last column contains the results, the remaining columns describe the settings (adjustments) of the parameters.
  1. How many trials ($jx$) are occurring ? How many factors ($ix$) ? What is, for each factor, the number of levels? What is the size ($lx$) of the code suitable for an affine model without correlation ?
    1. The given design includes $jx=27$ trials. There are $ix=4$ factors, each of them with three levels that are labeled $0,1,2$. The size of the whole product space to be explored is $\Omega=81$.
    2. If we consider these levels as mere identifiers, the corresponding reduced Boolean coding necessitates $lx=1+2+2+2+2=9$ unknowns (one for the affine constant of the model, and two for the independent levels of each factor).
  2. Read the file into Scilab (by modifying the procedure for reading ... or by editing the file by hand). Give matrix $ma$ coding the design.
    1. It was necessary to transform the "space" into something else, such as "underscore". Then convert text separators, formerly "space / semi-colon / space" into Scilab tokens, such as "space".
    2. In case of failure, it was suggested to modify the file by hand, using a text-processor such as edit or notepad or scipad or ...
    3. The result is a string matrix datas whose size is $\left(jx+1\right)\times\left(ix+1\right)=28\times5$, then a numerical matrix ma whose size is $jx\times lx=27\times9$ (cf Table 1, left).

    Table 1: Matrices ma and Ma
    \begin{table}\begin{displaymath}\left(\begin{array}{rrrrrrrrr}
1&-1&-1&-1&-1&0&1...
...1&1&0&0&0\cr 1&1&0&1&1
\end{array}\right)\end{displaymath}\par\par
\end{table}


  3. Give some Cartesian representations of the design under study. What can be observed? What do you think of this allocation?
    1. The Cartesian map $\left[1,2\right],\left[3,4\right]$, that use the first two factors for the first axis and the last two for the second axis, does not reveal any defect.

      \begin{displaymath}\begin{array}{ccccccccc}
1&0&0&0&0&1&0&1&0\cr 0&0&1&0&1&0&1&0...
...1&0&1&0&0\cr 0&1&0&1&0&0&0&0&1\cr 1&0&0&0&0&1&0&1&0
\end{array}\end{displaymath}

    2. On the other hand, Cartesian maps using three of the four factors are illustrating the qualities of the design. There are four ways to choose three of the four terms. And for every restriction a three factors complete design is obtained.

      \begin{displaymath}\begin{array}{ccccccccc}
1&1&1&1&1&1&1&1&1\cr 1&1&1&1&1&1&1&1&1\cr 1&1&1&1&1&1&1&1&1
\end{array}\end{displaymath}

  4. Recall what is the measure typically used to determine the quality of a design of experiment. Get, by a random pick, an alternative design. How good is it? Compare with the design described by the given file.
    1. The measure usually used to determine the quality of a design is the minimum eigenvalue of the matrix ms=ma'*ma. The greater is this value, the lower is the propagation of error.
    2. The command [Rdatas,Rma,Rms]=randplan() provides a design among all the $C_{81}^{27}$ possibilities.
    3. When comparing the values of the two matrices ms (left) and Rms (right), we see that the proposed design ms is far better that the design Rms obtained at random.

      \begin{displaymath}\begin{array}{rr}
9&3.194381\cr 9&4.286482\cr 9&7.414665\cr 9...
...150\cr 27&30.746163\cr 27&37.171821\cr 27&50.191043
\end{array}\end{displaymath}

2 Best Boolean Model Without Interaction

  1. Use the least squares method and determinethe affine model without interactions mx that provides the best fit with the experimental results.
    1. The least squares method is
      ms=ma'*ma ; mx=(1/ms)*ma'*mb 
      This formula is used to determine the model which provides the best fit with the experimental data.
    2. One obtains:

      \begin{displaymath}\begin{array}{r}
9.5803648\cr -0.0147956\cr 3.2608300\cr 0.04...
... -0.0992214\cr 4.0853027\cr -0.0675738\cr 4.9267311
\end{array}\end{displaymath}

  2. Assess the confidence that can be given to this model. Give the details of computations, draw and print any useful graphic.
    1. The confidence that can be given to a model is related to its ability to predict the results of the measures that have not been made. The assessment of this ability is mainly based on its ability to retrospectively predict the past.
    2. A first graph is obtained by reporting the experimental results on the x-axis and the discrepancies between model and reality on the y-axis. This yields Figure 1.
      Figure 1: Discrepancies versus results
      \includegraphics[width=0.95\columnwidth,keepaspectratio]{ds_residus-sav}

    3. The estimation of the goodness of fit is based on the ratio of the variances :
      VRF=variance(mb)/variance(mdelta)$\approx8.51$
    4. A visual estimation can be obtained by using the ratio of the amplitudes :
      (max(mb)-min(mb))/(max(mdelta)-min(mdelta))$\approx2.84$
      roughly speaking, this ratio is not very different from the square root of the VRF.
    5. The value VRF$\approx8.51$ quantifies the ability to back-cast past events. The fore-cast ability is obtained by using a correcting factor, due to the degrees of freedom that have been burnt to obtain the coefficients of the model. This yields to :

      \begin{displaymath}
est\left(VRF\right)=8.51\times\frac{27-9}{27-1}\approx5.89\end{displaymath}

    6. Even so reduced, the quality of the model is good (VRF clearly above 1).
  3. Evaluate the (isolated) influence of each factor (plot, print and comment any useful graphic).
    1. The influence of each factor is estimated by examining the amplitude of variations obtained while averaging all others factors (cf Figure 2).
      Figure 2: Influences of the four factors
      \includegraphics[width=0.95\columnwidth]{/home/douillet/docs/Ensait/planx/planx_ds07a/ds_influ-sav}

    2. Factor D is the most influencing, while factor B is about six times less influent.

3 Best Continuous Affine Model

The parameters are now considered as continuous quantities, and the numbers $0, 1, 2$ are now perceived as the values taken by a continuous variable, rather than only labels for a stepwise parameter.
  1. Trial $\left[A,  B,  C,  D\right]$ is now coded by $\left[1,  A-1,  B-1,  C-1,  D-1\right]$ in order to use centered variables. Obtain the new coding matrix mA and the corresponding new model mX.
    1. Matrix mA is described Table 1, right.
    2. Matrix mX=(1/mS)*mA'*mb is :

      \begin{displaymath}\begin{array}{r}
9.5803648\cr 3.2534322\cr 0.8432093\cr 4.0356921\cr 4.8929442
\end{array}\end{displaymath}

  2. Compare the quality of this model with the quality of the previous model.
    1. Regarding the ability to predict the past, this new model is slightly worse than the previous one. Indeed, var(mdelta)$\approx4.750$ while var(mDelta)$\approx4.759$.
    2. The measure based on the lowest eigenvalue is clearly in favor of the new design, the matrix mS being now diagonal with eigenvalues $18,18,18,18,27$.
    3. Moreover, this new model requires less coefficients and its forwards forecasting ability is much better:

      \begin{displaymath}
est\left(FRV\right)=\frac{40.429}{4.759}\times\frac{27-5}{27-1}\approx7.19\end{displaymath}

4 Best Second Degree Model

A quadratic model is now considered.
  1. Show that the design already studied can be used to evaluate the pairwise interactions of the parameters.
    1. A quadratic model requires $n\left(n-1\right)/2=6$ interaction coefficients and $n=4$ curvature coefficients together with the $n+1=5$ already used coefficients of the affine model. This results into $15$ degrees of freedom burnt to compute coefficients, leaving 10 of them for averaging the errors.
    2. We can also forecast that curvature coefficients will be very small, and only take the $6$ interaction coefficients. Acting that way leads to a $11$ coefficients model.
  2. Conduct the necessary calculations and obtain the coefficients of this new model.
    1. Procedure auh(j,k) can be used to build stepwise the Zma (without square terms) and maa (with square terms) matrices.
    2. It is better to work with centered variables. The $x_{i}$ are centered by design. Therefore, the $x_{i}x_{j}$ are centered either. On the other hand, the square terms are not centered. We rather use $y_{i}=x_{i}^{2}-2/3$ and correct matrix maa accordingly into matrix Yma.
    3. The spectra of matrices Yms and Zms are given by :

      \begin{displaymath}\begin{array}{ccccccccccccccc}
6&6&6&6&9&9&9&15&15&15&18&18&18&18&27\cr  & & & &9&9&9&15&15&15&18&18&18&18&27
\end{array}\end{displaymath}

    4. The corresponding coefficients are given in Table 2. It can be seen that affine, only crossproducts and full quadratic models are "nested". This happens because the corresponding groups of columns are linearly independent (Yms is a block diagonal matrix).

    Table 2: Coefficients mX, Zmx and Ymx of the three models
    \begin{table}\begin{displaymath}\begin{array}{rrr}
9.5803648&9.5803648&9.5803648...
...&0&0.1488321\cr 0&0&0.1013607
\end{array}\end{displaymath}\par\par
\end{table}


  3. Some concluding words would be welcome.
    1. The latest model, with four more factors, leads to a residual variance slightly smaller. But what counts is not to better predict the past.
    2. However, the model without square terms has a better ability to predict the future because its VRF is higher (446 instead of 395).
    3. In any case, a second degree model is far better than a first degree one since the terms expressing interactions 12 and 34 have an influence comparable with the first degree terms, and are therefore essential for an efficient modeling.

5 Listing


/usr/doc/.TeX/resultats-us.txt

previous up next_inactive
Up: Return to previous menu


douillet@ensait.fr
2008-04-11