Many reliability engineers have discovered HALT will quickly find the weaknesses and reliability risks in electronic and electromechanical systems from the capability of thermal cycling and vibration to create rapid mechanical fatigue in electronic assemblies. Assemblies that have latent defects such as cold solder or cracked solder joints, loose connectors or mechanical fasteners, or component package defects can be brought to a detectable, or patent, condition by which we can observe and potentially improve the robustness of an electronics system. Thermal cycling creates expansion and contraction, stressing mismatched material thermal coefficients of expansion (TCE) interfaces. Applying vibration to an assembly, especially the pneumatic repetitive shock of HALT chambers, creates very rapid mechanical fatigue. When Gregg Hobbs, Ph.D., PE created HALT and HASS methods back in the 1980’s, digital systems were not as prevalent and bus speeds were much slower than today’s electronics. As the signal speeds continue to increase and circuit features get smaller in electronics HALT has a potentially significant additional benefit for signal integrity (SI) and operational reliability during new product development.
Today’s electronics are requiring bus speeds that have to have ten times better resolution than the time it takes light to bounce off your nose and hit your eye, which takes about 85 picoseconds. As data bus speeds increase affects in data transmission that were second and third order affects are now becoming dominant in SI issues. These new variables may be difficult if not impossible to model accurately. The continue decrease in metallization and higher bus frequencies will result in increased sensitivity to fabrication variations. SI issues are likely to become more dominant in reliability of hardware as a result of the continued decrease of metallization and increase in bus speeds. Yet, the effect of these developments on operational reliability may also be more difficult to find and reproduce before thousands or millions are sent to the field.
Failures in SI in many times results in marginal operational reliability or “soft failures” where a system can be reset and operate normally. Depending on the frequency of these operational failure events, the user may or may not tolerate their occurrence. When too frequent, intermittent operational reliability may result in returning the system to the manufacturer. The returned system then may then be broken down and all subassemblies subjected to failure analysis. When divided up, the subsystems tested will likely be declared “No Fault Found” (NFF) as the marginality may only come from the stack up of parametric variations, or unique environmental conditions of original system in the end-use environment. To modify an old adage “If you cannot find what broke, you cannot fix it” and the cause of the marginality and returns will continue. The result is a churn of “good” parts being returned being sent out to replace “good” parts. The returned parts may be sent to a repair depot to be used for repair or replacement. Those returned parts may or may not work with a different system depending on the systems stack up, but it is likely the manufacturer will never come to know one of the potential real contributors to the high NFF rate. Of course there are many other causes of NFF returns not necessarily related to hardware issues. If the issues come from SI and timing marginality, thermal stress to operational limits can be a very useful tool to discover these issues before mass production.
We know that in mass manufacturing of anything there will be variation in any parameter that is measured. We know that during PWB manufacture that some dimensional variations will occur during mass manufacturing, although hopefully the variations are small. Dimensional variations in PWB can affect impedance crosstalk, noise, and EMI issues in the system. Dimensional expansion and contraction of the PWBA of course is what induces the thermo-mechanical fatigue damage during thermal cycling that has been a primary focus of HALT and HASS methodology, but the dimensional variations also effects SI quality. We know from the SPC teachings of Dr. W. Ed Deming that reduction of manufacturing variation is the path to making a defect free product and “six sigma” production capability is the goal. When we design and build a complex high speed digital electronics system we cannot know necessarily how the stack up of all the real future variations in component manufacturing, circuit board fabrication, solder quality, and second sources of these possibly impact operational reliability. Yet we do know for sure that there will be parametric variations created at all the levels of assembly, and the affect operational reliability may only be discovered after a large numbers are produced and sold.
The challenge of finding marginal operation during early product development is illustrated graphically in the Figure 1. . Early samples of a new electronics product are typically expensive and scarce and all development teams want the limited samples. The graphic shown on the left side of figure 1 represents the parametric timing distribution found with a limited number of units. With a small number of units the parametric variation that could be near the upper and lower limits of would likely remained undiscovered before the product is released to from development to be manufactured in mass.