previous up next
Previous: Short presentation of tensile Up: Data Mining in Tensile Next: Uncertainty in Young's modulus

Subsections


Determination of the best fit interval

What is to solve

The difficulty to face with is that, actually, the experimental points are never aligned, and that for two kinds of reasons. On the one hand, the co-ordinates of each point are submitted to measurement uncertainties. On the other hand, the linear model is only relevant for a limited interval. When stress is too low, the phenomenon is not yet established and when stress is too high, i.e. approaches it's maximum value, the process that will lead to rupture becomes prevalent.

Therefore, our opinion is as follows : the process of measure must provide the range of relevance as well as the slope itself. Therefore four quanta of knowledge are to be extracted from data, and not only one: obviously the slope itself, but also the end points $ \alpha $, $ \beta $ of the interval of best validity of the linear model and a quality factor $ q\! f$ qualifying the obtained goodness of fit.

FIG.  2: The "rectilinear" part of the raw data.
\resizebox*{0.95\columnwidth}{4cm}{\includegraphics{figures/allon_force.eps}}

To facilitate comparisons between different materials, this interval of best fit will be expressed as a percentage of the maximum force, i.e. as $ \alpha \leq y/y_{max}\leq \beta $. This interval will be obtained by a compromise between two contradictory requirements. Increasing the size of the interval will minimize the influence of uncertainties dues to measurements, but will also take into account zones in which the non-linearity becomes noticeable, increasing uncertainties due to the model.

A quality factor $ q\! f$ has to be defined that summarize this compromise, and the best $ \left[ \alpha ,  \beta \right] $ will be chosen by maximizing $ q\! f$. The $ q\! f$'s selected by our research team, as well as the $ q\! f$ selected by [3] are not everywhere smooth, and may have many local extrema. Therefore, a sound algorithm of maximization is requested.

We have chosen to subdivide the interval $ \left[ 0\%,  75\%\right] $ in 25 parts, leading to 300 ordered pairs $ \left( \alpha ,  \beta \right) $, after eliminating pairs such that $ \beta -\alpha \leq 3\% $. This reduction of the searching space allows to obtain a first guess for $ \left[ \alpha ,  \beta \right] $ by exhaustion. Thereafter, a second pass can be done locally ($ q\! f$ being smooth near the absolute maximum).

This searching algorithm has an over-important by-product. Considering the best values obtained for the slope $ a$ (for example, the top $ 25\% $ when sorted according to quality factor), we obtain a cloud of values around the best fit value. In our opinion, this cloud indicates values that are not really discernible from each other and therefore provides a confidence interval around the best fit value.

FIG. : The general linear approximation: $ V\! RF\protect $ level lines.
\resizebox*{13cm}{4.5cm}{\includegraphics{figures/alfbet_frv.eps}}

Some well-known facts about regression

In any problem of regression, the reference model consists in approximating the variable $ y $ by a constant $ c $. The average quadratic error for this model is given by $ \delta ^{2}=\frac{1}{N}\sum \left( y-c\right) ^{2} $. It is well-known that :

$\displaystyle \delta ^{2}\doteq {\mathrm{E}}\left( \left( y_{j}-c\right) ^{2} \right) =\left( \overline{y}-c\right) ^{2}+\delta ^{2}_{min}$ (1)

showing that the best constant $ c $ is the arithmetic mean $ \overline{y} $ of the $ y $'s. The corresponding value of $ \delta ^{2}_{min} $ is $ {\mathrm{var}}\left( y \right) $, the variance of $ y $. This quantity constitutes the natural point of comparison for any other approximation of $ y $.

Carrying now an approximation by a linear function, we are brought to minimize $ \delta ^{2}\doteq \frac{1}{N}\sum \left( y-a  x-b\right) ^{2} $ by an efficient choice of $ a,  b $. An elementary calculation leads to:

$\displaystyle \delta ^{2}$ $\displaystyle \doteq$ $\displaystyle {\mathrm{E}}\left( \left( y_{j}-A  x_{j}-B\right) ^{2} \right)$ (2)
  $\displaystyle =$ $\displaystyle \left( \overline{y}-A  \overline{x}-B\right) ^{2}+\left( A-a\right) ^{2}  {\mathrm{var}}\left( x \right) +\delta ^{2}_{min}$  

showing that the best general linear approximation goes through the central point $ \left( \overline{x},  \overline{y}\right) $, with slope $ a\doteq cov_{xy}\div {\mathrm{var}}\left( x \right) $. The corresponding value of $ \delta ^{2}_{min} $, is the reduced variance of $ y $, and will be denoted from now by $ \sigma _{red}^{2}$.

Since the key point of this process is the decrease of remaining unexplained variance, it is convenient to define the variance reduction factor as :

$\displaystyle V\! RF\doteq \frac{{\mathrm{var}}\left( y \right) }{\sigma _{red}...
...t) }{{\mathrm{var}}\left( x \right)   {\mathrm{var}}\left( y \right) -cov^{2}}$ (3)

Choosing the quality factor

When applying the process described in the former paragraph to the traction test, we obviously have to determine the slope $ a$ relative to the raw data, i.e. the lengthening/force pairs. But, to facilitate comparisons with the values familiar to end users, all numerical results given in the present paper have been normalized after computation and expressed in terms of Young's modulus $ E $ (and therefore relative to the deformation/stress pairs).

As previously said, we have chosen to compute $ 300 $ regressions, corresponding to $ 300 $ choices of the interval $ \left[ \alpha ,  \beta \right] $ and therefore obtained $ 300 $ estimates for $ a$. Concerning the first test piece, these estimates are ranging into an interval such that $ E\in \left[ 63051,  72454\right]   M\! Pa$, leading to an amplitude close to $ 14\% $.

But, of course, this amplitude must be broken up into an uncertainty resulting from the model (caused by nonrelevant choices of the interval $ \left[ \alpha ,  \beta \right] $) and an uncertainty from the measuring instruments (caused by the repercussions of uncertainties carried by the primary measures). To this end, we can select a given proportion among all the obtained values, for example one of four, retaining the top $ 25\% $ values, ranked according to the chosen quality factor.

Using $ V\! RF\protect $ of EQ. 3 as the quality factor to determine the bounds $ \left[ \alpha ,  \beta \right] $ of the pertinence interval, one obtains the level lines of FIG. 3, each graph being relative to a test piece, namely $ DSC122\_L1 $ (left), $ DSC122\_L2 $ (center) and $ DSC122\_L3 $ (right). It can be seen that $ V\! RF\protect $ presents a more or less marked maximum, located in the neighborhood of (respectively) $ \alpha ,  \beta =9\%,  72\% $, of $ \alpha ,  \beta =9\%,  75\% $ and of $ \alpha ,  \beta =6\%,  69\% $. The corresponding values of $ V\! RF\protect $ are $ 17265 $, $ 12936 $ et $ 13932 $.

It has been shown in [1] that $ q\! f=n\times V\! RF$ (where $ n $ in the number of points in the interval) is a more efficient choice for the quality factor. In fact, both criterion are not leading to very different central values for $ a$. But they differ heavily when used to determine the $ 25\% $ best values that define a confidence interval around this central value.

FIG. : Equipotential surfaces of $ V\! RF\protect $ (top) and of  $ n\times V\! RF\protect $ (bottom).
\resizebox*{0.95\columnwidth}{!}{\includegraphics{figures/pics_frv.eps}}
\resizebox*{0.95\columnwidth}{!}{\includegraphics{figures/pics_nfrv.eps}}

This behavior can be observed in FIG. 4, that gives the equipotential surfaces of $ V\! RF\protect $ (top) and of $ n\times V\! RF\protect $ (bottom) according to the values of $ \left[ \alpha ,  \beta \right] $. It can be seen that surfaces of the second type are much more regular, and that the "parasitic peaks" corresponding to small intervals have disappeared.

Collecting the results

A visualization of these results is given by FIG. 5. The first part (upper left) shows a plot of the pairs $ \left( a,  V\! RF\right) $ obtained for the first test piece. The vertical lines are the limits of the range of the $ 25\% $ best values for $ a$ while the horizontal line stands at the 75th best $ V\! RF\protect $. The second part (upper right) gathers the three graphs corresponding to the three test pieces of the set, while the third part (bottom) uses $ n\times V\! RF\protect $ as quality factor.

FIG. : Using $ V\! RF\protect $ (top) and $ n\times V\! RF\protect $ (bottom) to rank the obtained slopes.
\resizebox*{6.5cm}{4cm}{\includegraphics{figures/aaa_frv_seul.eps}} \resizebox*{6.5cm}{4cm}{\includegraphics{figures/aaa_frv.eps}}
\resizebox*{8cm}{5.5cm}{\includegraphics{figures/aaa_nfrv.eps}}

The use of the quality factor $ n\times V\! RF\protect $ leads to the global range $ E\in \left[ 64600,  66100\right]   M\! Pa$, i.e. a relative amplitude around $ 3\% $, the relative amplitudes concerning only one test piece being around $ 2\% $. For example, the best formula for the first test piece of the set is:

$\displaystyle \sigma _{prev}=65717  \varepsilon +9.33$ (4)


previous up next
Previous: Short presentation of tensile Up: Data Mining in Tensile Next: Uncertainty in Young's modulus


douillet@ensait.fr
2003-06-13