Many in management and marketing of electronics companies want to believe and wish reliability engineers could predict the life of electronics systems. By knowing the future failure rates, we could budget warranty costs and the correct number of spare parts and replacement units before the product is launched.
In 1995 my friend Professor Michael Pecht, founder and chairman of the University of Maryland's Center for Advance Life Cycle Engineering Consortium, wrote and published the article “Why Traditional Reliability Predictions don’t work – Is there an Alternative.” In it he provides the history of one of the foundational documents of electronics reliability engineering, Military Handbook 217 (MIL HDBK 217) , and why it cannot predict electronic system failure rates. It was removed as a military reference document in 1995, largely due to the work of Prof. Pecht. It is amazing that MIL HDBK 217, removed almost 17 years is still being referenced and its progeny are still being used for reliability predictions in many electronics companies today. Needless to say electronics materials and manufacturing methods have changed tremendously in the last 17 years, but the continued belief that electronics systems reliability can be predicted has changed little in that time.
Electronics reliability cannot be predicted at a system level. The vast majority of failures of electronics hardware are due to design margin errors, component misapplication, errors in manufacturing processes, and customer misuse or abuse. It is very easy to confirm this is the case if you have access to the root causes of real field failures in real electronics products.
“All models are wrong, but some are useful” – George E. P. Box
Mathematical models to predict future events are, in many cases, valid and useful. Computer models and measurement systems that are used in meteorology to forecast weather conditions are improving, yet the ability to predict the weather more few days has been elusive. There would be huge benefits economically and in human lives if we could project longer than a few hours in advance when and where extreme weather events such as tornados or hurricanes will occur. With more inputs of contributing atmospheric conditions and computer algorithms, weather forecasting is getting better. Yet extreme weather prediction is limited to a few hours for tornados or a few days for hurricanes, before we know where they will hit.
Of course reliability prediction can be performed more accurately if we knew all of the many inherent potential failure mechanisms in an electronics system and the fatigue responses to the life cycle environmental profile (LCEP) stresses. Even if we could know all the inherent failure mechanisms in components, we would also need to include some information the time distributions of manufacturing variations and excursions that would modify the strength or rate of degradation of those mechanisms during manufacturing.
In many mechanical and electromechanical systems we do have physical wear mechanisms that can be mathematically modeled and from those models we can mathematically project the “life” of the mechanism. We know that in electric motors, wear of contact brushes, evaporation of lubricants, and wear of ball bearings eventually use up life, leading to failures due to wear out. Mechanical switches and hinges have a limited fatigue life. Through those models we can extend the life in mechanical systems by increasing the reservoir of material or reducing the driving stress conditions. In electronics there are a few devices, such as batteries, that do have short wear out modes relative to technological obsolescence and modeling life is very useful and necessary.
It is much more difficult to determine the underlying life-limiting mechanisms of solid state electronics components such as IC’s in a complex system and much less in a PWB. Not only the intrinsic physics causing component degradation and failure must be known, but also the PWB and solder fatigue mechanisms must be known for each package. BGA solder joints and PTH (Plated through Hole) vias do not fatigue at the same rate under the same stress inputs. Of course the stresses for all the mechanisms on the PWB and components can vary widely depending on the PWB locations.
LCEP for most electronics systems is a very rough guesstimate
Reliability prediction must also determine the Life Cycle Environmental Profile (LCEP) and also the LCEP distributions for the future field population. We must know to some precision the actual LCEP stress distributions along with the inherent product “strength entitlement” distributions to know where the strength distribution overlaps the stress distribution resulting in product failures. Please see my blog post “Reliability Paradigm Shift From Time to Stress Metrics” for more explanation of the Stress/Strength relationship in reliability.
So many electronics systems have a wide variety of LCEP’s with new applications of systems that result in new LCEP’s that were never considered. Take an example of VGA projectors that we see in many conference and meeting rooms. Some projectors are permanently mounted on the ceiling and many others are mobile. The ceiling mounted units fatigue stress most likely comes from thermal cycling during power cycling, and the mobile units have that stress plus the shock and vibration from transporting. The mobile units’ populations have a much wider distribution of LCEPs. I doubt the manufacturers of these products know the distribution of the LCEP for these two distinct end use environments. End users will expect the same reliability in both, regardless of the very different LCEP’s. Of course some of the mobile units will break instantaneously from an accidental drop. If and when it breaks from an accidental drop, will the user blame their own mishandling for the cause, or blame the manufacturer for making a “fragile” projector and never buying again from the same manufacturer? Certainly we do not expect our cell phones to fail after a waist high drop, but again at what height of drop would we blame the failure being caused by us?
When it comes to electronics systems reliability modeling and prediction, we really cannot know all the mechanisms or the distributions of the LCEP. Even if all the degradation models were known and all the combinations of stress distributions and effects in the assembly were known, the challenge of reliability prediction is compounded by variations over time in manufacturing.
Focus on real weakness discovery – less on guessing a very uncertain future
We have even less time to model partial or whole systems and the resulting fatigue damage and degradation as the design and manufacturing cycle times for new electronics continue to decrease. Even if we are able to model the degradation and fatigue damage of every potential failure mechanism in a PWB, the models must be based on the units from capable manufacturing, not variations, and we know there will be variations. Additionally, modeling can only establish a failure rate based on inherent wear out mechanisms known LCEPs, even though there may be new applications and different future LCEP’s that were not known when the product was designed.
“The best way to predict the future is to create it.” - Peter Drucker
Just as the prediction of our future, many would like to know what the future holds for the electronics we make and use. Yet for complex electronic systems there has been no evidence that we can model and predict the future failure rates, regardless of the fact that many still want to believe it can be done and want it to be true.
Empirical stress limit discovery is a vastly more efficient tool for building a reliable electronic system. Using stepped stress to limits methods (such as HALT) and focusing on discovery of potential weaknesses that could be a reliability risk (missing predicate). We can very quickly find the strength limits of complex electronic systems under stress conditions in order to establish a benchmark of strength based on current standard electronics technologies. By knowing empirical stress limits, we can develop safe and efficient ongoing accelerated reliability testing to precipitate and detect manufacturing errors or excursions that result in latent defects.
Unfortunately, there is still the wish that accurate prediction is possible and many are still feeding the wish that reliability of electronics hardware can be predicted based on past invalid documents. Without the ability to share real field reliability data that belief is likely to continue.