USA   F-111 Aardvark OZ
Magnetic Rubber Inspection

Last Updated 8 October, 2003

Table of Contents
Fundamental & Applied Research

Probabability of detection for magnetic rubber inspections of F-111 steel components

C A Harding and G R Hugo
Defence Science and Technology Organisation
506 Lorimer Street, Fishermans Bend, Victoria, 3207, Australia.


    Magnetic rubber inspection is used extensively on F-111 aircraft to detect cracks in critical steel components. Scheduled inspection intervals are based on a durability and damage tolerance analysis which requires as input data an assessment of the reliability of the technique. DSTO and the Royal Australian Air Force (RAAF) recently conducted an experimental program to determine the probability of detection for magnetic rubber inspection of F-111 components. This program involved a series of inspections performed on coupon specimens by RAAF technicians under conditions simulating those experienced during in-service inspections. A program of Monte Carlo simulations was used to demonstrate the validity of different statistical methods for analysis of the relatively small experimental data set obtained from the field trial.

    Keywords: Magnetic rubber inspection, reliability, probability of detection


    Magnetic rubber inspection (MRI) is a nondestructive evaluation (NDE) technique, which is used extensively on F-111 aircraft to detect cracks in D6ac steel components, including the wing-pivot fitting, the wing carry-through box and several other critical structures within the airframe. The magnetic rubber technique is a variation on magnetic particle methods, in which a liquid rubber containing suspended magnetic particles is poured into a dam surrounding the area to be inspected on a magnetised component. After the rubber sets, the cast is removed and examined for evidence of cracks or other discontinuities, which appear as dark lines on the surface of the cast. Inspections can be performed using either an applied magnetic field, which is maintained whilst the rubber sets (active field), or the residual field from magnetisation of the component prior to pouring the rubber. A key feature of the inspections conducted on the F-111 D6ac steel structure is the need to reliably detect very small fatigue cracks down to 0.010 inch [Imperial units are conventionally used in connection with the F-111.] (0.25 mm) in length. MRI is labour intensive but capable of detecting such small defects.

    Scheduled inspection intervals for nondestructive inspection of F-111 are based on a durability and damage tolerance analysis (DADTA), which incorporates as input data an assessment of the reliability of the NDE technique. The DADTA assumes that the airframe contains pre-existing flaws at critical locations, and then models the growth of these flaws given a representative flight loading spectrum. The initial flaw size is assumed to be the smallest that can be reliably detected by the NDE technique. It is taken to be the minimum crack size (a9095) for which a 90% probability of detection (POD) has been demonstrated experimentally with 95% statistical confidence. Here, the 95% statistical confidence level accounts for the uncertainty inherent in determining the POD from a finite statistical sample. The DADTA modelling predicts a number of flight hours for the defect(s) to grow to a size, which could cause failure of the component. The inspection interval for periodic NDE is then taken to be a fraction (typically one-half) of the number of flight hours for the assumed defect to grow to a critical size.

    The withdrawal of all USAF F-111 from service circa 1998 left Australia as the only nation operating this aircraft type, with the expectation that the F-111 would remain in Royal Australian Air Force (RAAF) service for a further 20 years until its planned withdrawal date in 2020. As part of a coordinated package of research to support the RAAF as sole-operator of the F-111, DSTO conducted a review of all available information concerning the reliability of magnetic rubber inspections. In this review, insufficient documentary evidence could be located to satisfactorily demonstrate the required POD (Hugo & Scala 2001). Consequently, DSTO commenced an experimental program to determine the POD (including a9095) for RAAF magnetic rubber inspection of F-111. It was anticipated that this information could allow inspection intervals for magnetic rubber inspections to be increased, thereby achieving significant savings on maintenance costs and reducing aircraft unavailability due to scheduled inspections.

    During the initial review, it was noted that the published methodologies for analysis of POD data were generally developed and demonstrated for the analysis of relatively large data sets. Since the experimental program for RAAF MRI would necessarily be a relatively small trial, a number of possible analysis algorithms were examined to assess their applicability for the modest quantity of data to be obtained.

Experimental Program

    The experimental program involved simulated field inspections of a series of coupon specimens by RAAF technicians. Two specimen types were used (Figure 1): a 'bolthole' specimen representative of typical cracks occurring in boltholes, and a 'mousehole' specimen representing cracks in more general structure, including radii. The latter specimen was designed to be similar to the fuel-flow vent holes (mouseholes) within the F-111 wing-pivot fitting. Field inspections were conducted with the specimens inserted inside a scrap wing-pivot fitting in order to realistically simulate the effects on reliability of the restricted access typically encountered by RAAF technicians.

    Fig 1: Specimen types used for field trial, shown mounted inside wing-pivot fitting.

    The coupon specimens were fabricated from D6ac steel, heat-treated to the same condition as components in the F-111 airframe. Fatigue cracks were generated in the specimens at DSTO using a 'DADTA2b' spectrum loading, representative of flight loading at a typical location in the lower plate of the wing pivot fitting. Small corrosion pits (up to 50 m in size) were electrochemically generated in the specimens prior to fatiguing to act as fatigue crack initiators. The use of corrosion pits as crack initiators was necessary in order to reduce the scatter in crack initiation times sufficiently to be able to successfully generate the very small fatigue cracks (down to 0.004" in length) required for the trial. Both bore and quadrant (corner) cracks with lengths ranging from 0.002" (0.05 mm) to 0.090" (2.3 mm) were successfully generated using this procedure. For the mousehole specimens, the mousehole shape was produced by electric discharge machining from an initial keyhole notch shape after generating the fatigue cracks, taking care not to remove the cracks during the machining process.

    From a total of 103 specimens prepared, a set of 21 bolthole and 28 mousehole specimens were selected to achieve as uniform a distribution of crack sizes as possible. Uncracked specimens were used as placebos, some of which had been fatigued but were uncracked, whilst others were as machined. The trial included a total of 360 inspections on cracked holes and placebos, in roughly equal proportion, according to a randomised schedule of inspections. Six RAAF technicians of different experience levels participated in the trial and were drawn from the Base NDT section at 501 Wing RAAF Amberley. Trials were conducted at RAAF Amberley under similar levels of pressure due to workload as those encountered by technicians for on-aircraft inspections. To prevent collusion between technicians, they reported their results by session number and by the station number of each of four coupons located inside the wing-pivot fitting for each session.

    Two different methods were used to magnetise the specimens in the field trial. The mousehole specimens were inspected using an active field from a horseshoe magnet spanning the mousehole, whilst the bolthole specimens were inspected using the residual field following magnetisation using a central conductor inserted through the hole (applied current of 500A for 5 sec).

    Technician results were reported using defect codes adapted from those used in service. Technician reports were compared to the results of 'master' magnetic rubber inspections in order to determine a 'hit' or 'miss' result for each inspection of each confirmed crack. The master inspections were performed at DSTO with the specimens under an applied tensile load in a mechanical testing machine, which was found to give significantly clearer indications because the crack mouths were opened by the applied load. Master inspections were performed both before and after the field trials. A subset of specimens were broken open for fractographic examination in order to confirm

    1. the reliability of the master magnetic rubber inspection results and
    2. the accuracy of the crack lengths measured from the master inspection casts.
    The final form of the trials data comprised hit/miss results for each inspection of each crack in the trial, along with the length of the crack measured from the master inspection casts.

Statistical Analysis Methods

    Methodologies for determining POD fall into two basic categories,

    1. verification of required POD at a single specified crack size and
    2. determination of POD as a function of crack size.
    The former approach is rarely employed because it provides no information about the POD at other larger or smaller crack sizes and, if the experimental trial fails to demonstrate the required POD, then the whole experiment needs to be repeated for a larger crack size. In the present work, only the second approach has been considered.

    Determination of POD as a function of crack size requires a series of inspections on specimens containing a range of crack lengths,a, to infer a curve, POD(a), which plots the POD as a function of crack size. Statistical methods may be further differentiated into

    1. those which assume a particular functional form for POD(a) (curve fitting methods), and
    2. those which make no assumption about the relationship between POD and crack size.
    The curve fitting methods have been generally favoured in the literature for the analysis of POD data. These methods assume that POD(a) may be described by a mathematical function containing a number of unknown parameters (usually two). A regression analysis or maximum likelihood estimation (MLE) is then applied to fit these unknown parameters to the experimental data (Petrin, Annis & Vukelich 1993; MIL-HDBK-1823 1999). The curve fitting methods have the advantage that, by fitting only two (occasionally more) unknown parameters of an assumed function, they make very efficient use of the available experimental data and typically give less conservative lower confidence bounds on the POD curve when compared to other methods. However, the curve fitting methods have two potential drawbacks:

    1. The magnitude of the POD at large crack sizes is dependent upon the form of the mathematical function assumed for the POD curve. In particular, for the functional forms most favoured in the literature (log-normal and log-logistic), the POD asymptotes to 1 at large crack sizes. This may be inconsistent with reality, for which human factors might cause the POD to be less than 1 even for very large cracks. It is not possible to test the appropriateness of the assumed mathematical function using a small data set, as it requires a much larger experimental data set to test the shape of a curve than to fit unknown parameters in an assumed function.
    2. The methodology conventionally used to infer the 95% confidence limit POD curve is derived from the properties of statistics in the limit that the sample size tends to infinity (Cheng & Iles, 1983). It is therefore appropriate for large data sets, but may be less applicable for smaller data sets.

    Other methods, which make no assumptions about the functional relationship between POD and crack size, include the range interval method (RIM) and optimised probability method (OPM) (Berens & Hovey 1981, Bruce 1998). These methods can be applied to data sets of any size and have the advantage that the POD curves inferred by them cannot be compromised by an inappropriate choice of functional form for POD(a). However, the confidence limits derived from these methods are almost always more conservative than those obtained from curve fitting methods and may be very conservative when applied to small data sets.

Validation of Analysis Methods

    The applicability of various statistical methods for relatively small data sets was assessed using a program of Monte Carlo simulations. Synthetic hit / miss results were generated for a random set of crack lengths according to an assumed true POD(a). The functional form used for the true POD(a) curve was a cumulative log-normal distribution, which is commonly used for analysis of POD data.

    Several analysis algorithms were applied to this synthetic data set. Curve fitting methods provide a 'best fit' (MLE) POD curve and a lower 95% confidence limit curve, as well as key parameters such as the a9095 crack length (minimum crack length which gives 90% POD with 95% statistical confidence). The simulation procedure was repeated 1000 times to determine the distributions of the fitted POD curves and parameters and to assess the conservatism or non-conservatism of the fitted curves and parameters with respect to the assumed true POD curve.

Results and Discussion

    Table 1 presents the results for 1000 simulations, each comprising 100 inspections, with the true POD curve chosen to give a50,true = 0.005" (0.13mm) and a90,true = 0.011" (0.28mm), where a50 and a90 denote the crack lengths corresponding to 50% and 90% POD. Methods 'MLE method 1' and 'MLE method 2' denote two different formulations for determining the 95% confidence limit on the maximum likelihood estimation of POD. Method 1 was based on a general procedure described by Cheng & Iles (1983), using their parameter Q2 to define the confidence region. Method 2 implemented a much simpler closed-form solution proposed by Bullock, Forsyth & Fahr (1994). For a 95% confidence limit, there is only a 5% chance of obtaining a data set which gives a confidence limit that is non-conservative with respect to the true value. Thus, we would expect the 95% confidence limit to be non-conservative (a9095<a90,true) at most 5% of the time, on average[The 95% confidence limits derived by these methods are for the whole POD curve as a function of crack length; i.e. there is a 95% confidence that the confidence limit curve will be conservative with respect to the true POD at all points. The chance that any single point on the curve (eg a9095) will be non-conservative should be significantly less than 5%]. From Table 1, this expectation is easily satisfied for OPM and MLE method 1. However, for MLE method 2, the a9095 value was non-conservative for 15.9% of the simulations. This result could not have occurred by chance (probability < 10-10) and indicates a serious problem with this analysis method. Indeed, a review of the derivation of the key formulae in the paper from which the method was taken revealed an incorrect assumption which, when corrected, renders the method invalid for determining confidence limits on POD curves. Thus MLE method 2 should not be used for the analysis of POD data.


    inches (mm)

    inches (mm)

    % Non-conservative
    (% for which
    a9095 < a90,true)


    Not determined

    0.025[For 30% of simulations, OPM failed to reach 90% at 95% confidence level for any crack length. The mean value for a9095 is computed from the 70% for which valid a9095 values were obtained.] (0.64)


    MLE method 1

    0.011 (0.28)

    0.019 (0.48)


    MLE method 2

    0.011 (0.28)

    0.014 (0.36)


    Table 1: Simulation results comparing the a90 and a9095 values from different analysis methods to the true value a90,true = 0.011" (0.28mm). Results are for 1000 simulations, each comprising 100 inspections.

    The other results in Table 1 are consistent with expectations and show that OPM and MLE method 1 are both acceptable for determining POD curves from data sets as small as 100 inspections. The results are consistent with the curve fitting (MLE) method being much more efficient that OPM. For MLE method 1, the confidence limit a9095 was on average 70% greater than the true value a90,true, whilst for OPM, a9095 was on average 2.3 times the true value. OPM was unable to determine an a9095 confidence limit for 30% of simulations as the curve did not reach a POD of 90% within the range of crack sizes considered.

    The results of the experimental trial, analysed using MLE 1 and OPM methods, are shown in Figures 2 and 3 for inspections on mousehole and bolthole specimens respectively. The bolthole inspections have a somewhat higher POD at small crack lengths but a significantly poorer POD at large crack lengths. This is consistent with the proportion of cracks detected (hits) at each crack length in the field data. The bolthole data includes two very significant misses at 0.021" (0.53 mm) and 0.018" (0.46 mm). For the mousehole specimens, the 'best fit' or maximum likelihood estimate of the crack length at which the POD reaches 90% is a90 = 0.009" (0.23 mm), whilst the 95% confidence limit crack length for 90% POD is a9095 = 0.012" (0.30 mm). By comparison, the more conservative OPM gives a9095 = 0.028" (0.71 mm). For the bolthole specimens, the maximum likelihood estimate a90 = 0.015" (0.38 mm). However, due to the relatively poorer POD and the limited quantity of inspection data, the 95% confidence limit does not reach a POD of 90% within the range of the field inspection data and an a9095 value cannot be reported.

    The bolthole and mousehole inspections differ significantly in the magnetisation method (central conductor vs active field) and in the surface condition of the areas inspected. The mouseholes were highly polished, consistent with the surface condition during RAAF inspections of mouseholes and stiffener runouts in the F-111 wing pivot fitting. The surface in boltholes was less well polished. The POD results obtained for the mousehole specimens are thus considered to be applicable only to inspections conducted on highly polished surfaces using an active field.

    It is possible that the significant misses and the poorer POD for bolthole inspections are related to the fact that, for a central conductor inspection with no defects present, there is normally no sign of the magnetic field on the cast, as the field lines run circumferentially with no leakage field in the absence of a defect. By comparison, for the active field technique, the casts always shows a 'halo' at the edges which provides post-inspection confirmation that the field was correctly applied. Thus, human error in applying the central-conductor procedure resulting in a lack of magnetisation could easily pass undetected, whereas for the active field technique inadequate magnetisation would easily be detected when inspecting the casts and the inspection would be repeated. This could explain the poorer reliability (lower POD) for the bolthole inspections at larger crack lengths, since inadequate magnetisation could cause a failure to detect cracks of any size. In spite of this, the best guess (MLE) a90 value is 0.38 mm (0.015"). This is still significantly better than could be achieved using other NDE techniques. It is likely that a usable lower confidence limit a9095 value could be obtained by extending the field trial to obtain more data for the bolthole inspections.

    The open circles plot the proportion of hits obtained for the field inspection data at each crack size, with their area being proportional to the total number of inspections performed at each crack size.

    Fig 2: Results of Experimental Trial for Mousehole Specimens.

    The open circles plot the proportion of hits obtained for the field inspection data at each crack size, with their area being proportional to the total number of inspections performed at each crack size.

    Fig 3: Results of Experimental Trial for Bolthole Specimens.


    The POD for magnetic rubber inspections of F-111 D6ac steel components has been determined as a function of crack length based on field trials completed by RAAF technicians at RAAF Amberley. It was found that inspections of boltholes using central conductor magnetisation were less reliable (lower POD) than the inspection of a mousehole geometry using an active field technique. The applicability of different statistical methods for analysis of relatively small data sets was examined using Monte Carlo simulations. A significant error was identified in one previously published analysis method. The simulations demonstrated that a variant of the standard MLE method (MIL-HDBK-1823 1999), utilising parameter Q2 of Cheng & Iles (1983) to define the confidence limit, is acceptable for determining POD curves from data sets as small as 100 inspections.


    The authors gratefully acknowledge the valuable contributions made by staff in the RAAF Non-Destructive Testing Standard Laboratory, who contributed to the planning and supervision of the field trial, and the technicians and supervising officers from 501 Wing Base NDT section whose willing cooperation was vital to the trial's success.


  1. Berens, A.P.& Hovey, P.W. (1981), Evaluation of NDE Reliability Characterization, Air Force Wright Aeronautical Laboratories, AFWAL-TR-81-4160, USA.
  2. Bruce, D.A. (1998), 'NDT Reliability Estimation From Small Samples and In-Service Experience', Airframe Inspection Reliability under Field/Depot Conditions, NATO RTO, RTO-MP-10 AC/323(AVT)TP/2, pp3-1 to 3-22.
  3. Bullock, M., Forsyth, D. & Fahr, A. (1994), Statistical Functions and Computational Procedures for the POD Analysis of Hit/Miss NDI Data, National Research Council Canada, LTR-ST-1964.
  4. Cheng, R.C.H. & Iles, T.C. (1983), 'Confidence Bands for Cumulative Distribution Functions of Continuous Random Variables', Technometrics, vol. 25, no. 1,
    pp 77 - 86.
  5. Hugo, G. R. & Scala, C.M. (2001), An assessment of existing magnetic rubber inspection probability of detection data for F-111 D6ac steel structure, Defence Science and Technology Organisation, DSTO-TN-0355, Australia.
  6. MIL-HDBK-1823, (1999) Nondestructive Evaluation System Reliability Assessment, Department of Defense Handbook, United States.
  7. Petrin, C., Annis, C & Vukelich, S. I. (1993) A Recommended Methodology for Quantifying NDE/NDI Based on Aircraft Engine Experience, NATO AGARD, AGARD-LS-190. LOGO
| Site Map | SEARCH
| Feedback | Combat Losses | Newsgroup / Notice Board | MEMORIAL
bulletin board
| images | sounds | movies | other links | published articles | history | troops | comics | models
| external differences | patches and badges | books | AGM-142 Popeye | museums | memorabilia | CD-ROM
Tail Numbers:
| EF / F-111A | F-111B | RF / F-111C | F-111D | F-111E | F-111F | FB-111A, F-111G | F-111K
F-111 Down-Under |
airshows | RAAF Tail Numbers | ejection | artwork | remembrances | KIDZ

Also visit' s companion site,   FB-111A Switchblade

Hosted WebSites:

CARLO KOPP's Technical Writings


Copyright '   F-111 Aardvark'. 1999-2002

Links to, and reviews of this site are welcome.

Disclaimer: This F-111 Aardvark Internet site does not represent the views of Boeing, Convair, General Dynamics, Grumman, Lockheed Martin, the United States Air Force, the Royal Australian Air Force or any other company or organisation which may be named herein.  Should any company, organisation or individual feel grieved that I am using their logo or product without permission, please contact me at the email above. Significant use is made of images public released by the Australian Defence Force (ADF).  These images are crown copyright, meaning that they may be reproduced but not for third party monetary gain. is not responsible or liable in any way for third party products or services featured or advertised on this website. is a 'not-for-profit' website.

Flag Gifs from