[IEEE 1992 42nd Electronic Components & Technology Conference - San Diego, CA, USA (18-20 May 1992)] 1992 Proceedings 42nd Electronic Components & Technology Conference - A study of failures identified during board level environmental stress testing

May 10, 2018 | Author: Anonymous | Category: Documents
Report this link


Description

A Study of Failures Identified During Board Level Environmental Stress Testing T. Paul Parker Cathy W. Webb AT&T Little Rock Works Little Rock, AR 72219-8912 ABSTRACT AT&T has investigated and implemented Environmental Stress Testing (EST) in the production of a variety of circuit board designs as a means of reducing the incidence of early life failures. EST techniques include thermal cycling, random vibration, and others. These techniques have proven more effective than traditional burn-in techniques. In addition, studies have revealed that functional monitoring during thermal stressing of circuit cards more than doubles the effectiveness of EST. Outgoing quality audits and customer first month failure rates have improved by factors of two to four since the implementation of EST. BACKGROUND This paper covers experiences with EST in a circuit card manufacturing environment. However, it would be improper to discuss manufacturing applications of EST without first examining its application in design. DESIGN An electronic assembly, such as a computer motherboard, is designed with operating specifications (for example, an operating temperature range of 0' C to 55 ' C and operating voltage range of 4.75 V to 5.25 V). Historically designers have tested new products to the design limits and no further. The rationale was that no customer would operate the unit beyond those limits. Life testing was performed on tens of units for months at a time, typically at 55 ' C. In 1982, a major electronics manufacturing firm began experimenting with a concept which accelerated the process of life testing by exposing products to environments beyond specification limits. This process was referred to a s STRIFE, STRESS + LIFE [1][2]. STRIFE adds design margin and robustness as designs are changed to correct faults found. It was found that failure modes which occur outside of operating limits on small samples during design evaluation later occur within specification limits on a large population of products in the field. STRIFE also aids in the development of a manufacturing EST program. MANUFACTURING Once the design process is complete, knowledge gained from STRIFE is applied to the development of an EST regimen for application in manufacturing. The use of EST concepts has grown rapidly in the past few years, especially in the commercial manufacturing area. These techniques are also commonly referred to as Environmental Stress Screening (ESS). Many companies investigate EST as an alternative to traditional circuit card burn-in. Non-monitored burn-in had been an accepted technique for eliminating infant mortality since the early 1970's, sometimes requiring up to 72 hours to complete. The need for alternatives was recognized for two reasons. First, Just In Time (JIT) manufacturing lines have become common and lengthy burn-in was not compatible with the JIT process. Second, customer expectations for quality and reliability are becoming tighter. During electronics assembly, due to deficiencies in components or assembly processes, weaknesses may degrade to become early life failures. These are usually mechanical defects such as a broken wire bond or defective solder joint. These defects often do not show up when first tested. However, with the stresses seen during assembly and shipment and with the progression of time, they can show up as early life or infant mortality failures. Using thermal cycling and other stresses, these latent defects can be identified prior to shipment. EST, by itself, can be only partially effective. Few, if any, environmental screens are 100% effective at finding all latent defects. In addition, if nothing is done to correct a problem at its source, it will often persist and the EST screen will continue to miss some fraction of latent defects. Expenses due to EST failure repair will also persist, adding to product costs. An EST program is much more effective when combined with an aggressive failure analysis and corrective action plan. EST programs should be coupled closely with Total Quality Management (TQM) programs which consist of process improvement teams and vendor partnerships. As failure analysis reveals the cause of defects and teams work to correct them, infant mortality can be brought under control and EST changed from 100% to sampling [3][4]. Techniques of Production EST The two primary techniques of production EST in use today are thermal cycling and vibration. While both have their benefits, the most common technique applied in the commercial electronics industry is thermal cycling (TC). Therefore, discussion will be limited to that area. There are four key parameters used when discussing TC, these are: 1. Temperature Range (Delta T ) 2. Temperature Rate of Change 3. Number of Cycles, n 4. Monitored vs. Power Only vs. Unpowered The maximum allowable temperature range for a product is determined during the STRIFE process. An initial target of 1 0 ° C to 2 0 ° C above and below the product's specified 177 0569-5503/92/0000-0177 $3.00 @I 992 IEEE I I operating temperature range is a good starting point. Rate of change will be limited by the thermal cycling facilities used. When using mechanical refrigeration in a system with air as the heat transfer medium, product rates of change will be limited to 5 to 20 * C/min typically. For chambers equipped for use with liquid nitrogen, thii can be increased to 25 C to 40’ C/min [SI. Some companies are investigating even higher rates of change using a liquid media [SI. In addition to increasing stress levels, high rates of change decrease the amount of time required to complete a given number of cycles, thus decreasing Work In Process (WIP) and long term capital investment in EST equipment. When starting an EST program on a new product, it is difficult to know the correct number of cycles to use. Typical numbers range from 4 to 20. This number can be optimized by monitoring a large population of product through 20 EST cycles. At some point the dropout rate as a function of cycle number levels off. Further thermal cycling past this point typically is not cost effective. Figure 1 is a plot of a typical thermal profile. The figure includes a section referred to as step stress which helps identify exact temperature thresholds of failure. STEP STRESS (CYCLE 0 ) CYCLE 1 70 C 26 C -20 c CYCLE N . . . . -- I PRODUCT POWER ON figure 1. lirennal Cyde Tmperature/Power B o f i l e The term ‘Cycles to failure’ for a unit under test is defined as the thermal cycle where a failure first appears. In order to know cycles to failure, one must test and monitor results during the EST process. Many choose not to monitor due to fixturing cost. While m o n i t o r e d EST can be effective, experience shows that monitoring can more than double the ability of EST to identify marginal products [7]. Examples of failures that can only be found by monitoring are intermittents, healers, and failures with a constant threshold of failure above or below 25°C. Experience has shown that as many as half of the failures experienced in EST are intermittent [8]. FMA and Corrective Action Applying environmental stress techniques is only the first step toward reliability improvement. An equally crucial process is that of Failure Mode Analysis (FMA) and corrective action implementation. During STRIFE, most of the corrective actions are made on the design such as parts selection, mounting techniques, layout, etc.. During production EST, corrective actions must be made in the assembly process and with component suppliers. Process engineers and suppliers must be aware of the stresses products will undergo in EST and should agree that these stresses will not adversely &ect their product. It is important to understand the nature of the failures EST is expected to find. These will dways fall in the category of design, component or process related. Thii information can come from existing field data or from in-process test d a t a For products that have been through STRIFE, design changes should be minimal. Many common problems are shared industry wide. For instance, many Surface Mount Technology (SMT) related assembly defects can be traced to the solder process. Intermittent solder connections can be caused by lack of control over solder paste height. Solder balls can be formed by excessive moisture in solder paste. Moisture absorption by large plastic IC packages can create damaging stresses during solder reflow. Bottom side SMT capacitors can also be damaged by improper controls on wave solder pre-heat. In addition, through-hole component assembly problems can often be traced to bent legs on Dual In-Line (DIP) parts. Experience at one electronics manufacturer has shown that most EST failures are associated with components [7]. Most mature, high volume components such as discretes, TTL, and DRAM devices ( EST SAMPLING FEASIBILITY STUDY 40 assemblies. Specific examples will be reviewed below. Failure mechanisms from these studies will be reviewed in a later section. ""$ QUALITY RESULTS " 0 5 10 15 20 25 CYCLE x - QTY DEF/5 CYCLES Figure 2. Circuit Card Cycles to Failure A mature through hole technology circuit card was chcsen to evaluate EST sampling effectiveness. The purpose of sampling is to reduce manufacturing cost without adversely affecting warranty costs or customer satisfaction. A cost break even point occuls when EST dropout falls and remains below 0.5% to l.O%, depending on warranty costs and the cost of performing EST [SI. EST dropout can only be reduced below these levels consistently if aggressive FMA and corrective action plans are carried out successfully. The circuit card selected had 30 ICs and 129 discrete components, designed around an 8 bit CPU. The EST profile consisted of 20 cycles from -20" C to 70'C, preceded and followed by a 25 ' C functional test. A sample of product was run in EST during March, 1989 where a 2.5% dropout was measured. All units began going through EST in September, 1989. The plan called for 100% EST until the dropout on 1000 consecutive units was under 0.5%. This occurred in January, 1990. EST sampling began in February at SO%, and was reduced to 25% in March and April. The study was ended in May when the product ceased to be manufactured. Figure 3 tracks results during the project. Figure 4 shows the corresponding Quality Assurance (QA) audit results for the same period of time. Prior to 100% EST, quality was erratic averaging 0.82% defective. Even with 100% EST, QA results still averaged 0.32%, slightly high for a mature technology of this complexity. It was only after all major failure mechanisms were eliminated that QA results achieved an acceptable level. This was crucial evidence after two years of investigations which proved that the most effective part of an EST program is not the actual process of stress testing but rather the SAMPLE SIZE I 3500 EFECTIVE 0.5 l'![ I 2 3 4 5 8 7 8 9 1 0 1 1 1 2 1 2 3 4 I 90 I 89 - S A M P L E S I Z E OSCILLATOR 0 P R O C E S S ASIC. C P " , C , 0 OTHER Figure 3. ESTPercent Defective by Defect Category QUALITY AUDIT RESULTS % DEFECTIVE 2000 1500 1000 500 0 + + I 2i NO EST 100% EST n \ + I 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 2 3 4 I 69 I 90 I + k DEFECTIVE Figure 4. Results Before, During loo%, and at Sample EST effectiveness of the actions taken to rectify the problems found. The most recent investigation dealt with EST on P C motherboards assembled by an outside supplier. The supplier shipped product to AT&T for assembly into systems. EST was performed on the cards after an out of box test at AT&T. The supplier performed only a standard 25 O C functional test. Over 17,000 circuit cards were tracked. Since out of box failures as received from the card assembly supplier were actual early life field failures at AT&T, the study also provided a comparison of EST failure mechanisms with these out of box failures to see if EST failures were indicative of true early life defects. Out of box quality and EST results are shown in figure 5. At no point in the study was out of box quality within the 0.5% guaranteed limit. The failure rate as received by AT&T ranged from 0.5% to 2.0%. In addition, EST revealed another 4% to 9% defective that would have otherwise shipped to customers. Failure analysis results revealed a close correlation between failure mechanisms seen at out of box test versus those found in EST. Figure 6 compares quantities of defects in five categories of failure at the two test points. 179 I I EST RESULTS % DEFECTIVE - ________ ____ + .* --- - ---- ._ 2 L.- 0 1 1 : 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 ____ - ~ __- -- 1 3 6 7 0 11 13 15 17 19 WEEK NUMBER - PRE EST % DEF ---- EST % DEF - TOTAL % DEF Figure 5. Incoming Motherboard Failure Rates OTY DEF 200 I so IO0 sa c PROCESS ASIC. OTHER IC. DISCRETE OSC/XTAL [3 OUT OF BOX TEST Figure 6. Quantities Defeetive Before and After EST SPECIFIC FAIUIRE MECHANISMS Defective components made up a majority of the early life defects identified in thii study, the balance are manufacturing process defects and design related margin issues. Six examples of commonly found mechanisms are given. 1. Moisture Induced Package Damage in SMT IC One part exhibiting thii failure mechanism was a 68 pin 50 mil pitch PLCC communication controller SMT device assembled by an outside board supplier. When failure rates increased suddenly at AT&T (see Figure 7), failure analysis by the supplier and local engineers began. The failures were confined to a single IC package date code. Curve tracer analysis revealed multiple open pins, generally on pins clme to the corner of the device. Failure analysis revealed multiple sheared ball bonds. Because of prior experience with a similar failure mechanism on an 84 pin PLCC device, the problem was correctly attributed to moisture in the IC package which created high stress during the board supplier’s Infrared Reflow (IR) process [lo]. Acoustic micrographs of the delamination occurring in the 84 pin device are shown in Figure 8. PPM (THOUSAND) 1 3 5 7 9 11 13 15 17 WEEK NUMBER PRE EST PPM - - - - EST PPM - TOTAL PPM Figure 7. 68 Pin PLCC p p m Levels Sample I3 Sample t 4 Control Unit Sample I2 Figure 8. Acoustic Micrograph of Delaminations Further investigation revealed that the communication controller supplier had allowed the part to be exposed to high humidity during a tropical storm and the parts absorbed moisture (0.18% to 020% by weight). Immediate corrective action involved the return of all boards with the suspect date code to the board supplier for part replacement. Long term corrective action involved improved moisture control procedures by the IC supplier, including shipment to the board supplier in Moisture Barrier Bags (MBB). This failure mechanism showed an unusual cycles to failure distribution as seen in Figure 9 (Note: Cycle 0 = 1.5 hour step stress as shown in figure 1). The failure mechanism was the same for cycle (F5 and cycle 11-14 failures. The cause of this delayed failure phenomena is not yet understood. 7 1 80 - - OTY DEF pins. In some cases, much of the aluminum pad was missing and the underlying silicon was cratered (Figure 11). Cross sections also revealed similar damage to the silicon underneath the bond pad (Figure 12). 0 6 10 15 NUMBER OF CYCLES - CYCLES TO FAILURE Figure Q. 68 A’n PLCC Cycles to Failure Early feedback from the field revealed that EST WBS not totally successful in screening all failures. While a high percentage were identified, failures still escaped. Recent data shows the greatest risk with thii failure mechanism comes later in its life [U]. Based on these findings, special effort should be made to remove all parts from a suspect population containing t h i failure mechanism without relying on screening to identify all defects. 2. Silicon Damage Due to Wire Bonding One circuit card experienced a significant rise in EST failure rate during the eleventh week of production. A major contributor to this increase WBS a 100 pin 50 mil pitch PLCC data buffer. Thii failure mechanism was identified in EST predominately during the first 3 cycles, however some defects required up to 11 cycles to fail (Figure 10). The defects had sufficient stability to go virtually undetected by in-process testing at W’C. In EST however, all failures began by failing hot only, typically greater than 40 * C. Some would take up to 40 cycles to degrade to the point of failing over the entire temperature range. Curve tracer analysis revealed multiple open pins, typically between 8 and 17 per device. FMA revealed severe damage around the aluminum bond pads associated with the open Figure 11. Lifted Ball Bonds /Damaged Bond Pads OTY DEF 20 7 \ - - I ,+ - -\ - - - 0 6 lo 15 NUMBER OF CYCLES Figure 10. Hybrid Data Buffer - Cycles to Failure Figure 1.2. Cross Section of Silicon Damage Under Ball Bond The IC supplier’s engineers assisted in identifying the cause of damage and isolated it to an offshore wire bond operation. Records showed insufficient bond strength during initial process evaluation for the lot in question. To overcome this problem, ultrasonic bond energy was increased to such an extreme that structural damage was created under the ball bond during the bonding operation. The supplier’s engineers provided failure analysis results to the wire bond assembly operators and engineers who modified process controls to guard against recurrence. 181 5. Miscellaneous Hybrid Device Mechanisms Many failure mechanisms are common to hybrid devices due to the variation induced with less automated assembly processes and the presence of many internal mechanical connections. Crystal oscillators have exhibited persistent problems. Output failure modes include improper frequency of oscillation (fifth overtone or fundamental vs. desired third overtone), low amplitude, and complete loss of oscillation. Principle mechanisms include loose debris, internal solder joint failures, lifted wire bonds, and crystal mount defects (Figure 13). Figure 15. Crystal Mount Failure Within Oscillator Can Plastic encapsulated hybrids have also exhibited failures due to internal connection defects. One recent investigation involved a hybrid delay line. Packaged as an injection molded 8 pin gull wing device, it contained a small scale TTL IC (prepackaged in plastic) with discrete capacitors and inductors. Figure 14 reveals a steadily increasing failure rate on parts supplied by the original vendor. Failure rates declined dramatically in week 9 when a new supplier’s part with different internal construction was substituted. ‘PM (THOUSAND) 1 3 5 7 9 11 13 15 17 19 WEEK NUMBER - PRE EST PPM .--- EST PPM - TOTAL PPM The failure mode was open input and output pins. The failure mechanism was cracks in an internal solder connection between one of eight device legs and internal legs of the TTL IC. The cracks were the result of excessive force being applied to the body of the part during mold release in the packaging process. This failure mechanism exhibited an interesting characteristic during the investigation that warrants a note of caution. As failing components were removed from the circuit boards and retested, they would no longer fail. This was due to the removal technique which heats the entire device until all solder connections to the board reflow. Because of this heat, the solder connections internal to the device reflowed and healed the crack. Only when devices were removed by cutting leads did the failure mechanism remain intact, establishing the correct path for root cause identification and corrective action. 4. Interconnection Failures Early in 1990 a product utilizing socketed Single In-line Memory Modules (SIMM) began experiencing a high EST drop out due to expansion memory failures. The failures typically began occurring during the first three cycles, remained solid for a period of 5 to 10 cycles, then partially or totally healed, only to fail in subsequent EST runs. Efforts to isolate the cause of failure to either the socket or the memory module produced erratic results. In-house failure analysis focused on the quality of the contact produced at the SIMM to socket interface. Cross-sectional analysis revealed an incompatibility between the design of one supplier’s socket contacts and the dimension of the contact pads on the SIMM being used. SIMM socket designs typically include a redundancy of contact points to the front and back pads on the module. Figure 15 demonstrates the potential for mismatch and loss of redundancy which made this combination more prone to failure. Another supplier’s SIMM with a longer pad area was substituted on boards that had been built up with existing sockets. The original SIMM manufacturer modified their design to improve contact area and assure redundancy with the sockets in use. 7 Figure 14. Delay Line ppm Levels - 7- Figure 15. Cross Section of SIMM/Socket Interface 182 5. Pnnted Circuit Board Via Failure In December 1991, a production line began experiencing elevated EST drop out with random failure symptoms. Investigation revealed open vias in the printed circuit board (PCB). The defect rate by board lot was approximately 8.5%. EST monitoring results typically indicated a failing condition during the elevated temperature portion of early cycles. Subsequent cycles exhibited intermittent failures, predominately at temperatures > 30 O C as shown in Figure 16. WO OEG -2 -1 0 4 r2 t 3 +4 r5 6 r7 e . . ~~ TYPE C 0 0 0 0 0 0 0 0 0 0 v 2s.K P 22.bc v - 2 l . Y . V -VAC . v -V .E . P 0.5C v lo.w P 20 .E 9 2 1 . x V 30.1C X P bO.7C P bP.2C X P 5 0 . e X P 5 v . e X P 6s.w X P -2o.x X P -21.1c X 9 -22 .U X V 45.2C X P 65.W v -21.4c . v -22.1c . P -22.4c . P b6.U P U.bC P - 2 1 . x . V -22.K . P - l b . Y . P 50 .Y P U.bC P -21.7C . P -23.bC . P -5.W P 64.bC X v -21.M . P -23.e . P 8.lC 9 6 7 . E 9 -21.7C . P -2s.c . P 6.2C 9 b5.W X P -22.1c . V -23.1C . V 0.bC V b5.7C X P -21.7C . o cr II*E 0 22:a 0 P:31 0 z :4v 0 P:55 0 P:57 0 23:03 0 23:12 0 U : l 7 0 23:m 0 a:n 0 2-53 0 n:39 0 23:ll 0 23:47 0 23:55 1 W:17 1 W l l V 1 m:22 1 m:26 2 Do145 2 0 :47 2 m:5o 2 Do152 2 Do:% S 01:15 S 01:17 S 01:M s 01:a s01:26 6 01:bI 4 0 l : U 6 01:50 4 0 1 s 5 m:16 5 &?:la 5 o2:m 5 02:Zb 6 02:46 b 02:U 6 02:Y 6 02:s 7 03:15 7 03:18 7 os:m 7 0 : Z b 8 03:46 1 w:o Figure 16. Monitored EST Results OJ Open Via i n PCB Failure analysis revealed weak rings in the via plating caused by bubble entrapment at the electrolcss copper plating operation (Figure 17). The supplier instituted process improvements to eliminate this condition, including more effective plating agitation and use of improved surface tension plating solutions. To evaluate the effectiveness of these changes, a sample based stress screen was implemented and test coverage was heightened. Screening efforts included comparative evaluation of a 100 cycle, -20' C to 70 ' C TC and a 5 minute, Bg r m high-rault coverage Vibration test. Preliminary findings indicate the thermal cycle EST to Le more successful at detecting this failure mechanism. Figure 17. Cross Section of Open Via i n PCB 6. SMT Assembly Dejects It has been well established that a cold solder joint or n e solder condition can provide enough electrical contact to pass ambient electrical testing [12]. This condition could escape traditional in-process test. In addition, tiny solder balls can intermittently bridge adjacent legs of an IC. Visual screening for this condition is greatly impaired with the use of fine lead pitch surface mount devices. As pin count and density increases, visual access to an individual solder joint decreases. Thermal cycling and vibration have both shown limited effectiveness in identifying defective solder connections. They are both equally successful in identifying defects. However, in most cases, failures are intermittent and can only be found with monitored EST. In the case of vibration, boards are shaken in a single axis, perpendicular to the plane of the board. The profile is 20 - Zoo0 Hz, log rms random, for 5 minutes. Most solder defects are found in the first two minutes of vibration and continue to fail for the duration of the test. In thermal cycling, most failures occur in the first 2 cycles, predominately cold. They sometimes heal, passing all subsequent cycling, and would be overlooked if monitored test data was not available. CONCLUSION Environmental Stress Testing has proven to be a valuable tool for AT&T in efforts to continuously improve the quality and reliability of its products. To obtain maximum benefit from an EST program, it must be closely coupled with an FMA and corrective action program. Two to four fold improvements in quality have been seen. Thermal cycling has proven effective in identifying sources of early life failures that were previously not identified in power only burn-in. While thermal cycling alone does have value, adding testing during cycling has more than doubled the effectiveness of defect identification. 183 I I REFERENCES 1. Suesy, C. "Reliability Growth Management in Non-Military Industry", 1988 Reliability Growth Conference, pp. 37-45. 2. Shinner, C. "The Board Electronic STRIFE Test (E3.E.S.T.) Program", ASQC Reliability Review, June, 1988, pp. 3-6. 3. Parker, TP., Private Communication 4. Wright, W.R. "Successful Stress Testing at IBM, Greenock, Scotland", Paper presented at 1991 Annual IES ATM Meeting. 5. Smithson, S.A.,"Effectiveness and Economics - Yardsticks for ESS Decisions", 1990 IES ATM, pp 737-742. 6. Beaton, B. "Thermal Accelerated Reliability Genego Environmental Testing (TARGET) Dynamic Board Thermal Shock Using a Single Fluid and Bath", International Electronics Manufacturing Technology Symposium, 1991. 7. Parker, T.P., "ESS Case Study of a High Density Surface Mount Surface Card", 1991 Proceedings - IES, pp. 393-402. 8. LoVasco, F. & Lo, K., "Relative Effectiveness of Thermal Cycling vs Burn-In, A Case Study", Paper presented at 1992 ECTC Conference. 9. Haibel, C., "Math and Economics of Screening Disk Drives", 1990 IES ATM, pp. 784-788. 10. Prasad, R., SURFACE MOUNT TECHNOLOGY, PRINCIPLES AND PRACTICES, 1989, Van Nostrand Reinhold, pp. 185190. 11. Moore, T.M., et.al.,"The Application of Scanning Acoustic Microscopy to Control Moisture/Thermal Induced Package Defects", International Society of Testing and Failure Analysis, October, 1990, pp. 251-258. 12. Millard, DL., "Solder Joint Inspection", ELECTRONIC MATERIALS HANDBOOK, Vol.1 Packaging, 1989 ASM International, p. 735. ACKNOWLEDGEMENTS T The authors extend a special thanks to Evie Cody, Gordon Harrison, and Linda Vance for their many contributions. We also wish to thank the management staff at Little Rock for their constant support of our work. 1 84 -I-


Comments

Copyright © 2025 UPDOCS Inc.