By Kirk Gray | March 12, 2013 at 11:33 AM EDT | No Comments
Posted February 10, 2012
Historically Reliability Engineering of Electronics has been dominated by the belief that 1) The life or percentage of hardware failures that occurs over time can be estimated, predicted, or modeled and 2) Reliability can be calculated or estimated through statistical and probabilistic methods to improve hardware reliability. The amazing thing about this is that during the many decades that reliability engineers have been taught this and believe that this is true, there is little if any empirical field data from the vast majority of verified failures that shows any correlation with calculated predictions of failure rates.
The probabilistic statistical predictions based on broad assumptions of the underlying physical causes begin with the first electronics reliability prediction guide begin November 1956, with the publication of the RCA release TR-1100, "Reliability Stress Analysis for Electronic Equipment", which presented models for computing rates of component failures. This publication was followed by the "RADC Reliability Notebook" in October 1959, and the publication of a military reliability prediction handbook format known as MIL-HDBK-217.
It still continues today with various software applications which are progenies of the MIL-HDBK-217. Underlying these “reliability prediction assessment” methods and calculations is the assumption that the main driver of unreliability is due to components that have intrinsic failure rates moderated by the absolute temperature. It has been assumed that the component failure rates follow the Arrhenius equation and that component failure rates approximately doubles for every 10 degC.
MIL-HDBK-217 was removed from the military as reference document in 1996 and has not been updated since that time; it is still being reference unofficially by military contractors and still believed to have some validity even without any supporting evidence.
Electronics reliability engineering has a fundamental “knowledge distribution” problem in that real field failure data, and the root causes of those failures can never be shared with the larger reliability engineering community. Reliability data is some of the most confidential sensitive data a manufacturer has, and short of a court order will never be published. Without this information being disseminated and shared, little changes in the beliefs of the vast majority of the engineering community.
Even though the probabilistic prediction approach to reliability has been practiced and applied for decades any engineer who has seen the root causes of verified field failures will observe that most all failures that occur before the electronic system is technologically obsolete, are caused by 1) errors in manufacturing 2) overlooked design margins 3) or accidental overstress or abuse by the customer. The timing of the root causes of these failures, which many times are caused by multiple events, are random and inconsistent. Therefore there is no basis for applying statistical or probabilistic predictive methods.
It is long past time that the electronics design and manufacturing organizations to abandon these invalid and misleading approaches, acknowledge that reliability cannot be estimated from assumptions and calculations. Instead a more effective approach is to put most of your reliability engineering resources to finding 1) errors in manufacturing 2) overlooked design margins 3) or weakest link that my cause failure in the roughest environment a customer may subject through stress testing as quickly as possible.