Accelerated Reliability Solutions, L.L.C.

Why Electronics Failure Prediction Methodology does not work, but we still wish it did

Posted 12-6-2012

 

When the number of factors coming into play in a phenomenological complex is too large, scientific method in most cases fails. One need only think of the weather, in which case the prediction even for a few days ahead is impossible.” ― Albert Einstein

 


 “Prediction is very difficult, especially about the future.” – Niels Bohr

 

 

 

We have always had a quest to reduce future uncertainties and know what is going to happen to us, how long we will live, and what may impact our lives. Horoscopes, Tarot Cards, tea leaves, and crystal balls have been used as specialized “tools” by fortune tellers to gaze into the future. The paradox of fortune telling is that by knowing the future, we can change it. The risk side of believing we know the future is also that if we incorrectly guess (assume) the causes of a future event, our prevention action may create additional costs or higher risk of an even worse event.

 

This is also true when making predictions of the future life of electronics.  Without clear traceability to actual physics of failure in electronics, assumptions about the causes of failures have added costs without benefits. Reliability prediction by much of the electronics industry is still being based on assumptions of a constant rate of failure only modified by the steady state temperature. This is despite the fact that there is NO identification of an intrinsic physical mechanism in active components that causes or would cause the increase in a constant rate of failure . There is no evidence or reason to believe that components fail at a constant rate.

 

Many in management and marketing of electronics companies want to believe and wish reliability engineers could predict the life of electronics systems. By knowing the future failure rates, we could budget warranty costs and the correct number of spare parts and replacement units before the product is launched.

 

In 1995 my friend Professor Michael Pecht, founder and chairman of the University of Maryland's Center for Advance Life Cycle Engineering Consortium, wrote and published the article “Why Traditional Reliability Predictions don’t work – Is there an Alternative.”  In it he provides the history of one of the foundational documents of electronics reliability engineering, Military Handbook 217 (MIL HDBK 217) , and why it cannot predict electronic system failure rates.  It was removed as a military reference document in 1995, largely due to the work of Prof. Pecht.  It is amazing that MIL HDBK 217, removed  almost 17 years  is still being referenced and its progeny are still being  used for reliability predictions in many electronics companies today.  Needless to say electronics materials and manufacturing methods have changed tremendously in the last 17 years, but the continued belief that electronics systems reliability can be predicted has changed little in that time.

 

Electronics reliability cannot be predicted at a system level.  The vast majority of failures of electronics hardware are due to design margin errors, component misapplication, errors in manufacturing processes, and customer misuse or abuse.  It is very easy to confirm this is the case if you have access to the root causes of real field failures in real electronics products.

 

“All models are wrong, but some are useful” – George E. P. Box


Mathematical models to predict future events are, in many cases, valid and useful.  Computer models and measurement systems that are used in meteorology to forecast weather conditions are improving, yet the ability to predict the weather more few days has been elusive.  There would be huge benefits economically and in human lives if we could project longer than a few hours in advance when and where extreme weather events such as tornados or hurricanes will occur.  With more inputs of contributing atmospheric conditions and computer algorithms, weather forecasting is getting better.  Yet extreme weather prediction is limited to a few hours for tornados or a few days for hurricanes, before we know where they will hit.

 

Of course reliability prediction can be performed more accurately if we knew all of the many inherent potential failure mechanisms in an electronics system and the fatigue responses to the life cycle environmental profile (LCEP) stresses.  Even if we could know all the inherent failure mechanisms in components, we would also need to include some information the time distributions of manufacturing variations and excursions that would modify the strength or rate of degradation of those mechanisms during manufacturing.

 

In many mechanical and electromechanical systems we do have physical wear mechanisms that can be mathematically modeled and from those models we can mathematically project the “life” of the mechanism.  We know that in electric motors, wear of contact brushes, evaporation of lubricants, and wear of ball bearings eventually use up life, leading to failures due to wear out.  Mechanical switches and hinges have a limited fatigue life.  Through those models we can extend the life in mechanical systems by increasing the reservoir of material or reducing the driving stress conditions.  In electronics there are a few devices, such as batteries, that do have short wear out modes relative to technological obsolescence and modeling life is very useful and necessary.

 

It is much more difficult to determine the underlying life-limiting mechanisms of solid state electronics components such as IC’s in a complex system and much less in a PWB.  Not only the intrinsic physics causing component degradation and failure must be known, but also the PWB and solder fatigue mechanisms must be known for each package.  BGA solder joints and PTH (Plated through Hole) vias do not fatigue at the same rate under the same stress inputs.  Of course the stresses for all the mechanisms on the PWB and components can vary widely depending on the PWB locations.

 

LCEP for most electronics systems is a very rough guesstimate


Reliability prediction must also determine the Life Cycle Environmental Profile (LCEP) and also the LCEP distributions for the future field population.  We must know to some precision the actual LCEP stress distributions along with the inherent product “strength entitlement” distributions to know where the strength distribution overlaps the stress distribution resulting in product failures.  Please see my blog post “Reliability Paradigm Shift From Time to Stress Metrics” for more explanation of the Stress/Strength relationship in reliability.

 

So many electronics systems have a wide variety of LCEP’s with new applications of systems that result in new LCEP’s that were never considered.  Take an example of VGA projectors that we see in many conference and meeting rooms.  Some projectors are permanently mounted on the ceiling and many others are mobile.  The ceiling mounted units fatigue stress most likely comes from thermal cycling during power cycling, and the mobile units have that stress plus the shock and vibration from transporting.  The mobile units’ populations have a much wider distribution of LCEPs.  I doubt the manufacturers of these products know the distribution of the LCEP for these two distinct end use environments.  End users will expect the same reliability in both, regardless of the very different LCEP’s.  Of course some of the mobile units will break instantaneously from an accidental drop.  If and when it breaks from an accidental drop, will the user blame their own mishandling for the cause, or blame the manufacturer for making a “fragile” projector and never buying again from the same manufacturer?  Certainly we do not expect our cell phones to fail after a waist high drop, but again at what height of drop would we blame the failure being caused by us?

 

When it comes to electronics systems reliability modeling and prediction, we really cannot know all the mechanisms or the distributions of the LCEP. Even if all the degradation models were known and all the combinations of stress distributions and effects in the assembly were known, the challenge of reliability prediction is compounded by variations over time in manufacturing.

 

Focus on real weakness discovery – less on guessing a very uncertain future


We have even less time to model partial or whole systems and the resulting fatigue damage and degradation as the design and manufacturing cycle times for new electronics continue to decrease.  Even if we are able to model the degradation and fatigue damage of every potential failure mechanism in a PWB, the models must be based on the units from capable manufacturing, not variations, and we know there will be variations.  Additionally, modeling can only establish a failure rate based on inherent wear out mechanisms known LCEPs, even though there may be new applications and different future LCEP’s that were not known when the product was designed.

 

The best way to predict the future is to create it.” - Peter Drucker

 

Just as the prediction of our future, many would like to know what the future holds for the electronics we make and use.  Yet for complex electronic systems there has been no evidence that we can model and predict the future failure rates, regardless of the fact that many still want to believe it can be done and want it to be true.

 

Empirical stress limit discovery is a vastly more efficient tool for building a reliable electronic system.  Using stepped stress to limits methods (such as HALT) and focusing on discovery of potential weaknesses that could be a reliability risk (missing predicate). We can very quickly find the strength limits of complex electronic systems under stress conditions in order to establish a benchmark of strength based on current standard electronics technologies.  By knowing empirical stress limits, we can develop safe and efficient ongoing accelerated reliability testing to precipitate and detect manufacturing errors or excursions that result in latent defects.

 

Unfortunately, there is still the wish that accurate prediction is possible and many are still feeding the wish that reliability of electronics hardware can be predicted based on past invalid documents.  Without the ability to share real field reliability data that belief is likely to continue.

Add a Comment

(Enter the numbers shown in the above image)

Counter
http://www.amazon.com/gp/product/1118700236/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1118700236&linkCode=as2&tag=accerelisolul-20&linkId=5CNQI23VNMPHLRMQ