previous up next
Previous: Uncertainty in Young's modulus Up: Data Mining in Tensile Next: Discussion

Subsections


Further data mining

Description of the data

When our study began, our informal opinion about the raw reports was something like "few mechanical constants hidden by a lot of noise''. But, as and when a careful study has been undertaken, it appears that the so called noise was carrying unexpected knowledge.

What follows is based upon five sets of raw reports. In order to examine the repeatability, three test pieces were machined for each set, leading to fifteen raw reports. Three materials were involved, coded as $ DAL093 $, $ DSC122 $ and $ JF\_049 $. For the last two, test pieces were cut not only in the longitudinal direction (code $ \_L $), but also in the transverse direction (code $ \_LT $).

One of us received these reports by electronic mail as a list of pairs $ \left( \varepsilon ,  \sigma \right) \protect $, without any other knowledge (they were embedded in another report, traveling by surface mail). He subjected these files to a critical study in order to examine if they could reveal something upon the missing data. And in fact, many things have been found after the initial success.

The frame-lines of this study are summarized in FIG. 9. The data were recorded as fixed point numbers, with 6 decimal digits. The number of these pairs is about the same in the three files of the same set, expressing that three similar test pieces are expected to break at about the same time. The column $ jx $ gives the minimal number of these pairs for a given set.

The fact that approximately a thousand of points has been recorded for each test piece should not make illusion on the precision obtained. Indeed, only a small part of them takes part to effective calculations. A first indication of this fact is the number $ jx_{80} $ number of pairs checking $ y\leq 0.8  y_{max} $. A complete discussion will show that, for series 2 to 5, only about sixty points are actually involved in computations.

FIG.  9: The data files.
\begin{figure}{\centering\begin{tabular}{\vert c\vert c\vert c\vert c\vert c\ver...
...7982 \)&
\( 17982 \)&
\( 17922 \)\\
\hline
\end{tabular}\par }
\par\end{figure}

The fact that $ \varepsilon $ and $ \sigma $ do not have the same useful range (respectively $ \simeq 1 $ and $ \simeq 500 $) introduce a difference of treatment between the two quantities, since $ \sigma $ has been recorded with $ 9 $ significant digits, and $ \varepsilon $ with only $ 7 $. Moreover, quantities $ \varepsilon $ and $ \sigma $ are not directly given by measurements, but are calculated from measurements of $ \Delta L_{0} $ and on $ F $.

Recording $ \Delta L_{0}/L_{0} $ instead of $ \Delta L_{0} $ is unpleasant, because that muddles the appreciation of uncertainties. Moreover, recording $ F/A $ instead of $ F $ is conceptually harmful because the cross section of the test piece, uniformly defined at the beginning of the experiment, is expected to vary in space and time as the test goes on. Estimating $ \sigma $ from $ F $ is thus an affair of modeling, and must carefully be separated from the experimentation, i.e. from an objective statement of what actually took place.

We thus tried to return to the "true raw data" starting from these "pre-conditioned" data who were in our possession. The remarkable fact is that the data were so marked by the "preconditioning" that this reconstitution was completely possible.


Regularities related to X-coordinates

We started from the idea that the values provided by a sensor are integer multiples of a certain constant, and we thus sought, for each test piece, the existence of a $ divx$ fraction such that the recorded $ x $ were very close to integer multiples of this fraction. To formulate this criterion in a precise way, let us define the fractional part $ {\mathrm{frac}}\left( x \right) $ of number $ x $ as it's difference with the nearest integer, that is to say:

$\displaystyle {\mathrm{frac}}\left( x \right) =x-\mathrm{round}\left( x\right) $

Our objective is to find a factor $ divx$ such that:

$\displaystyle \forall x  :  -0.005\leq {\mathrm{frac}}\left( x\div divx \right) \leq +0.005$ (8)

To obtain this factor $ divx$, we calculated the iterated differences of recorded values, sorting each time the results obtained. In other words, the first pass provides numbers $ \left( \delta x\right) _{j}=x_{k+1}-x_{k} $, where the same $ \left( \delta x\right) _{j} $ can be obtained for several indices $ k $. Then a second pass provides $ \left( \delta ^{2}x\right) _{j}=\left( \delta x\right) _{k+1}-\left( \delta x\right) _{k} $, and so on.

For series $ DSC\_122 $, the $ jx=1364 $ values lead to $ 21 $ values for $ \delta x $ (by identifying the values differing only from one unit of the last order), then to $ 15 $ values for $ \delta ^{2}x $ and finally to $ 3 $ values for $ \delta ^{3}x $. Multiplying all these numbers by $ 10^{6} $ to be more readable, we obtain:

$\displaystyle \left( \delta x\right)$ $\displaystyle =$ $\displaystyle -2632,  0,  3334,  5263,  6667,  7895,  \cdots$  
    $\displaystyle \qquad \cdots ,  26316,  26667,  28948,  31579$  
$\displaystyle \left( \delta ^{2}x\right)$ $\displaystyle =$ $\displaystyle 175,  352,  527,  \cdots ,  2457,  2632,  3334$  
$\displaystyle \left( \delta ^{3}x\right)$ $\displaystyle =$ $\displaystyle 177,  350,  702$  
$\displaystyle \left( \delta ^{4}x\right)$ $\displaystyle =$ $\displaystyle 173,  352$  

Therefore, a first approximation of $ divx$ is thus $ divx\approx 175\times 10^{-6} $. But the precision obtained is insufficient to check EQ. 8: it is necessary to gain several more digits. To that end, it can be noticed that the last value of $ \delta ^{2}x $ (i.e. $ .003334 $), is known with a greater relative accuracy than the first one ($ .000175 $). We use the fact that $ 3334/175=19.05\approx 19 $ to obtain a better estimate of $ divx$, i.e. $ divx\approx .003334/19\approx .000175474 $.

Applying this idea to last value of $ \delta x $, we can use $ 31579/175.474=179.96\cdots \approx 180 $ to obtain the improvement $ divx\times 10^{6}\approx 31579/180=175.439 $. Using the last five $ \delta x $, instead of only the last one, we obtain five approximations of "equal dignity", having all the same five leftmost digits. It remains now to imagine which could be the initial fraction. As it is well known, the continuous fraction algorithm is exactly designed for that goal. We therefore calculate the convergents associated with these numbers and apply the criterion EQ. 8.

It comes that the convergent $ divx=\frac{1}{5700} $ is appropriate (leading to $ \forall x  :  -0.003\leq {\mathrm{frac}}\left( x\div divx \right) \leq +0.003 $), while others are not. For the six test pieces $ JF\_049 $, a similar situation occurs, but with the fraction $ divx=1/5250 $ (cf. § V-D for a comparison with the complete reports).

For three test pieces $ DAL093 $, the situation is more complicated. The first $ 300 $ $ x $ of each series are exact multiples of $ divx=1/750=7/5250 $. Thereafter, it can be seen that the variations of $ x $ are either multiple of $ 1/750 $ or multiples of $ 2/605 $. More precisely, one of the two numbers $ 750  x $ or $ 605  x/2 $ is closer to an integer than $ 10^{-3} $, i.e. closer than the rounding errors, since the $ x $'s are given with $ 6 $ digits after the decimal point (cf. § V-D).


Regularities related to ordinates

We undertook a similar study with the ordinates, and obtained the factor $ divy=125/738 $ for three test pieces $ DAL093 $. Such a fact suggests a rectangular form for the corresponding test pieces. The fact that it does not appear a fractional factor $ divy$ for the other series suggests on the contrary that these series were related to cylindrical test pieces.

We thus sought to obtain, for each test piece, an 8 decimals approached value of this factor $ divy$. To obtain such a result, we started from two values $ divy_{1} $ and $ divy_{2} $ obtained by reiterated differences. Then, we chose a "goodness of fit" criterion and applied an iterative method of type regula falsi to obtain an optimal value.

The criterion used is as follows. For a given value of $ divy$, we determine:

$\displaystyle m_{g}$ $\displaystyle \doteq$ $\displaystyle \inf _{x}{\mathrm{frac}}\left( y\div divy \right)$  
$\displaystyle m_{d}$ $\displaystyle \doteq$ $\displaystyle \sup _{x}{\mathrm{frac}}\left( y\div divy \right)$  

According to the definitions, we have $ -0.5\leq m_{g}<0<m_{d}\leq 0.5 $, and our criterion $ \phi $ is given by:

$\displaystyle \phi \left( divy\right) =\frac{m_{d}+m_{g}}{m_{d}-m_{g}}$ (9)

For the exact value of $ divy$ (if there is one...) the values of $ {\mathrm{frac}}\left( y\div divy \right) $ are produced by truncation errors. One can expect that these values follow an uniform random law, leading to $ \left\vert m_{g}\right\vert \approx \left\vert m_{d}\right\vert \ll 0.5 $ and therefore to $ \phi \left( divy\right) \approx 0 $. For an approached value of $ divy$, the values of $ {\mathrm{frac}}\left( y\div divy \right) $ will range into an off center interval, leading to a value of $ \phi $ ranging between $ -1 $ and $ +1 $. The iterative process is thus the following:

$\displaystyle divy_{n+1}$ $\displaystyle =$ $\displaystyle \lambda   divy_{n}+\left( 1-\lambda \right) divy_{n-1}$  
$\displaystyle where\quad 0$ $\displaystyle =$ $\displaystyle \lambda   \phi \left( divy_{n}\right) +\left( 1-\lambda \right) \phi \left( divy_{n-1}\right)$  

The process thus described is unstable because a bad value of $ divy$ leads to $ \left\vert m_{g}\right\vert \approx \left\vert m_{d}\right\vert \approx 0.5 $ and thus also to $ \phi =0 $. Therefore, it is advisable to limit the variations of $ divy$ by a condition like $ \lambda \in \left[ -1.5,  +2.5\right] $. Such a condition slows down the process, but avoids to leave the window of shooting. One obtains the $ divy$'s listed FIG. 9. For all of them, we have $ \left\vert m_{g}\right\vert \approx m_{d}\approx 2  10^{-5} $.

Considering interpretation $ y=F/A $ with $ A=\pi   D^{2}/4 $ (by indicating by $ D $ the diameter of the assumed cylindrical test piece), it is natural to consider the numbers $ 1/\sqrt{\pi   divy  1.8} $. One finds:


$\displaystyle 1.0016666667,  1.0000000001,  .99833333341, $      
$\displaystyle .99333333340,  .99166666673,  .99000000007$      

i.e. numbers whose relative variation is, within 10 digits of precision, the $ 1/600 $ of a given value. Such a regularity could result from a $ 6  mm $ nominal diameter of measured with a $ 0.01  mm $ accuracy.

In any event, these factors $ divy$ are dependent on the granularity of the measurement process concerning forces and sections. It would be much more effective to directly hold the records of force and dimensions measurements, since they are the primary measurements.


Interpretation of the $ divy$ factor

In §V-B and §V-C, we have loudly exposed the conclusions we have been able to draw from incomplete data because that fact proves clearly that a complete record does not only contain the requested mechanical data mixed to some noise resulting from uncertainties of measurements, but still contains many other knowledges. And that it is possible to extract these knowledges by suitable techniques.

Therefore, it is advisable not to disturb the recording of these additional data by any uncontrolled kind of pre-cooking.

After having obtained the results described in the two former paragraphs, we gained access to missing data from the final reports issued by the laboratory. As regards to factors $ divy$, the $ DAL093\_L1 $ report describe the test piece as being rectangular with a section $ A=2.46\times 12.00=29.52  mm^{2} $, so that $ divy=5/A $.

The other test pieces were described as being cylindrical, with a nominal diameter of $ 6  mm $, measured except for $ 0.01  mm $. The 12 times, one have $ divy=5/A $ with $ A=\pi   D^{2}/4 $.

The presence of this factor 5 in all the series seems to indicate that the values recorded by the sensor of force were integer multiple of 5 Newtons. The sensitivity of the sensors does not appear in the final report, but can be inferred from the calibration of the testing machine [6].

Attempt at interpretation of the factor $ divx$

As regards to the six test pieces $ DSC122 $, the same factor $ divx=1/5700 $ has been found, leading to $ \left\vert m_{g}\right\vert \approx m_{d}\approx 0.003 $. It is clear that such a phenomenon requests some explanation because this situation would be completely incredible for uniformly distributed decimal numbers. Moreover, it can be seen that this phenomenon does not take place for the extreme dividers of $ 5700 $. This can thus be interpreted as being caused by a quotient of lengths measured in hundredth of mm by a length being worth $ 57  mm $ exactly, or by a quotient of lengths measured into fiftieth of mm by a length being worth exactly $ 144  mm $. A similar situation occurs for test pieces $ JF\_049 $.

On the other hand the regularities noticed for series $ DAL093 $ are more difficult to explain. They indeed suggest the appearance of a second phenomenon in the vicinity of $ \sigma =0.95  \sigma _{m} $, that is to say $ \varepsilon =0.05  \varepsilon _{Tb} $. This phenomenon, having $ divx=2/605 $ as associated factor, comes to interfere with the ordinary phenomenon of elongation, whose factor would be $ divx=1/750=7/5250 $.

A possible cause of this strange situation is the fact that two sensors are used successively. Indeed the process of measurement starts by recording the $ \Delta L_{0} $ resulting from a sensitive but fragile sensor, clutched in two places of the central zone of the test piece, initially distant of $ L_{0} $. In the first phase of the process, one thus evaluates the deformation by $ x=\Delta L_{0}/L_{0} $. Before the rupture, this sensor is disunited from the test piece in order not to be deteriorated when the test piece breaks. During this second part of the test, the $ x $ used comes from $ x=\Delta L/L $ where $ L $ and $ \Delta L $ are provided by a sensor interdependent of the jaws of the traction apparatus.

If this interpretation is the good one, the remarkable phenomenon would not be so much the situation $ DAL093 $ (which carries trace of this connection), but rather the four others which do not carry such traces.

Another possible cause is the use of an "average sensor". Indeed, a usual extensometer is U-shaped. One of the internal faces is smooth, and slips on the test piece. The other internal face presents two points of anchoring, connected to the gauge. It is possible to have a better symmetry by using two points of anchoring on each face, each pair being connected to a gauge.

A reasonable device should allow a separate acquisition of two measurements, making it possible to control each gauge by the other. But on the used device, only an average is transmitted by the genuine acquisition chain. Installing a two ways channel for transmitting separately each measure is not so easy due to the certification procedures for extensometers [7].


previous up next
Previous: Uncertainty in Young's modulus Up: Data Mining in Tensile Next: Discussion


douillet@ensait.fr
2003-06-13