# Development of Reliability Allocation and Assessment Algorithms for Designing Multilevel Microelectronic Systems

Injoong Kim, Raghuram V. Pucha,\* Russell S. Peak, and Suresh K. Sitaraman

Abstract—Design-for-reliability of complex systems involves top-down reliability allocation approaches, reliability prediction of both random and wear-out failures, and bottom-up reliability assessment approaches to provide more insight into the system-level reliability.

Designing complex microelectronic systems, while considering reliability in the early design stages, is a challenge because these systems have multilevel structure and logical groups, and numerous components are associated with failure modes and mechanisms. To address these difficulties and to design reliable systems in a systematic way, reliability allocation and reliability assessment algorithms and associated reliability predictions methodologies are presented in this paper in the context of a System-Design-for-Reliability (SDfR) framework. Reliability allocation algorithms are presented for both parallel and series systems that calculate the target reliability of subsystems from the given target reliability of their parent systems. The reliability allocation algorithm is demonstrated for random failures in a video broadcasting system that consists of a four-level packaging structure. The reliability assessment algorithm is demonstrated for wear-out failures in a USB board system that consists of multiple logical groups and various failure modes and mechanisms. The reliability assessment algorithms also demonstrate the use of physics-based reliability prediction of each logical group before assessing the system reliability. The demonstrated results show that the algorithms are useful for determining system configurations and design parameters. Such design changes will reduce the burden of downstream reliability activities.

Keywords—System Design for Reliability (SDfR), multilevel microelectronic systems, random failures, wear-out failures

### Introduction

Microelectronic systems such as cell phones, computers, consumer electronics, and implantable medical devices consist of subsystems which in turn consist of other subsystems and components. When these microelectronic systems are designed, fabricated, assembled, and tested, companies need to meet reliability, cost, performance, and other targets so that the products are competitive in the market place.

Most reliability problems might be corrected during proto-

Manuscript received June 2007 and accepted January 2008

G.W. Woodruff School of Mechanical Engineering, and Product & Systems Lifecycle Management Center, Georgia Institute of Technology, Atlanta, Georgia 30332

\*Corresponding author; e-mail: raghuram.pucha@me.gatech.edu

type testing. However, design changes after prototype testing take time and are expensive. Therefore, considering reliability during early design stages is necessary. This approach is called "design-for-reliability," and it increases the chance of developing reliable designs while decreasing the burden of downstream reliability activities [1].

Most of the current design-for-reliability studies focus on component design [2], rather than on system design. This is because of the difficulties in sharing reliability goals over a multilevel assembly structure and in assessing complex system reliability with logical groups and various failure modes and mechanisms [3].

For example, because a system failure can be caused by the failure of any component in an assembly of hundreds or thousands of components, each component must be considered for system reliability, and all reliability-related activities should be consistently organized through multilevel assembly structure.

In addition to multilevel assembly structure, multilevel logical groups in a system make system reliability analysis difficult. For example, the failure of one group in a microelectronic system can be backed up by another logical group without any problem of system functions or with partially limiting system functions

Another difficulty with system reliability is multiple failure modes and mechanisms. For example, a microelectronic system may fail by different types of failure modes and mechanisms such as random failures of capacitors and wear-out failures of solder joints. Even a solder joint wear-out failure may be different when the joint is subjected to thermal cycling than when it is subjected to mechanical vibration or to a combination of both. Therefore, it can take significant effort to isolate and analyze such different failure mode and mechanism.

To overcome such difficulties and to design reliable microelectronic systems, a System-Design-for-Reliability (SDfR) method is presented. This SDfR method is an integrated topdown and bottom-up approach overarching multilevel structures. The SDfR method is implemented by two main reliability algorithms: a reliability allocation algorithm and a reliability assessment algorithm.

The reliability allocation algorithm is demonstrated for random failures by a video broadcasting system that consists of four-level packaging structure. The reliability assessment algorithm is demonstrated for wear-out failures by a USB board system that consists of multiple logical groups.



Fig. 1. Reliability and hazard functions.

This paper is organized as follows: Section II introduces background and related work. In Section III, we introduce the SDfR method. Section IV describes the reliability allocation algorithm for a video broadcasting system design that considers random failures, and Section V describes the reliability assessment algorithm for a USB board system design that considers wear-out failures.

#### BACKGROUND AND RELATED WORK

Reliability is the probability that an item operating under stated conditions will survive for a stated period of time [4]. Reliability is an important factor in system development, because high reliability makes systems competitive in the market and saves maintenance cost.

Studies of system reliability initially applied stochastic knowledge of field failure data in the 1950s [5]; this approach is referred to as the "statistic data-based method" herein. Such statistic data-based research addresses random failures during the useful lifetime of systems [6]. Representative statistic data-based reliability prediction models are MIL-HDBK-217 and Telcordia SR-332 [7]. These models facilitate reliability prediction of complex electronic systems. However, because they cannot explain the causes of failures, enhancing reliability by design changes is limited.

On the other hand, physics-based research, initiated in the 1960s [8], deals with wear-out failures and attempts to identify the factors related to the lifetime of systems. As results of physics-based studies, various accelerated testing [9] and reliability simulation models were developed (e.g., the accelerated thermocycle testing [2] and the solder joint fatigue model [10]). These physics-based models are useful in determining design parameters and improving reliability. However, they are limited in considering relations, interactions, and dependencies among components and diverse failure modes in systems [5], [11].

The different aspects of statistic data-based and physics-based research are well illustrated by the bath-tub curve, a commonly accepted curve of hazard functions [6]. This curve shows three distinct stages (Fig. 1): an early failure stage, a random failure stage, and a wear-out failure stage. The first stage of the bath-tub curve is characterized by early failures, also known as infant failures. These failures are caused by manufacturing errors, and systems with such potential infant mortality are often



(b) Hazard function of the bath-tub curve

screened out and removed in the "burn-in" process. Systems that successfully pass through the burn-in process will be released for broad usage. The second stage is characterized by random failures, caused by randomly changing operating conditions such as freak loads. These failures will occur at a constant rate. The last stage is characterized by wear-out failures, which are caused by cyclic stress, mechanical wear, chemical reaction, and so on. These failures typically occur at an increasing rate.

Depending on their types, the lifetime characteristics of components differ. For example, some may have a long wear-out lifetime with a high or low random failure rate, and others may have a short wear-out lifetime with a high or low random failure rate. Since various types of components exist in a system, both the statistic data-based methods and the physics-based prediction methods are necessary for system development [6], [7], [12].

These two different approaches also have different aspects of reliability analysis. Statistical data-based research uses statistics principles to analyze complex system structure (e.g., fault tree analysis [FTA], which is an event-based logic diagram method that displays the relationship between a potential event affecting system performance and an underlying cause for this event, and reliability block diagram [RBD], which is a component-based logic diagram method that represents complex series and parallel connections of components together), while physics-based research uses physics principles to analyze complex failure modes and mechanisms (e.g., failure mode and effects analysis [FMEA], which is a method for analyzing potential failure modes early in the development cycle, and failure mode, effects, and criticality analysis [FMECA], which is an extension of FMEA. In addition to the basic FMEA, it includes a criticality analysis, which is used to chart the probability of failure modes against the severity of their consequences [13]). If both statistical data-based and physics-based principles are used for reliability analysis, then a complete reliability analysis structure from systems to failure modes may be constructed [14].

The ultimate goal of such reliability prediction and analysis studies is to enhance system reliability during the lifecycle. Current system reliability enhancement methods—such as the Stage Gate Process [2] and reliability enhancement methodology and modeling (REMM) [15], [16]—are applied at the prototype stage of the system lifecycle using accelerated testing methods. Although they are effective reliability enhancement methodolo-



Fig. 2. Conceptual model for SDfR

gies, they are expensive because any unsatisfactory results of reliability testing require redesign and retest cycles until the results are satisfactory. If reliability knowledge is applied at the design stage effectively, the chances of redesign and retest cycles may be reduced. Therefore, such an approach is cost-effective for reliability enhancement.

This work aims to present an integrated method and its algorithms for designing microelectronic systems for reliability using both statistics and physics-of-failure knowledge [17-19] in a scope ranging from random failures to wear-out failures, from system structure to failure modes, and from reliability allocation to design recommendations.

#### System-Design-for-Reliability Method

#### A. Conceptual Model for SDfR

SDfR in general involves complex design methods. A conceptual view of SDfR is illustrated in Fig. 2. In this approach, target reliability is allocated in a top-down fashion from parent systems to subsystems, as illustrated using solid lines in Fig. 2. The reliability of components or failure modes is usually determined from statistical data-based models, accelerated testing-based models, or physics-based models, as illustrated using dash-dot lines in Fig. 2. System reliability is then assessed from components to subsystems to parent systems, in a bottom-up fashion, as illustrated through dashed lines in Fig. 2.

When the assessed reliability is greater than or equal to the

assigned target reliability, then no design modification is recommended. When the assessed reliability is less than the assigned target reliability, design changes are recommended, as illustrated using boxed arrows in Fig. 2. When subsystem design changes are not possible or available, design changes may be initiated at the parent system level so that redundant/modularized/alternative subsystems or other approaches may be pursued.

#### B. Metrics for SDfR

According to Wood [20], reliability metrics are grouped into two general categories: constant rate metrics (exponential distribution) and probability of success metrics (nonexponential distribution). The constant rate metrics are a good approximation for the random failure stage of the bathtub curve, and the probability of success metrics are used in representing nonexponential distribution like the wear-out failure stage with more than two parameters.

Because the SDfR method covers both random failures and wear-out failures, we define two reliability metrics for SDfR: constant random failure rate ( $\lambda$ ), which is a value that measures reliability of the random failure stage, and percentile wear-out failure reliability ( $R_{\rm w}(T_{\rm w})$ ), which is a probability value that measures wear-out failure reliability of items at a given time, at target time-to-wear-out-failure ( $T_{\rm w}$ ) (Fig. 3). These metrics are defined so that they can be easily and consistently used across each system level and reliability activity.



Fig. 3. Metrics for SDfR.



Fig. 4. Multilevel packaging structure of a video broadcasting system. (The video broadcasting system is a product of EGT, Inc. EGT, Inc., permits the release of the figures and the information [Feb. 2007].)

### TARGET RELIABILITY ALLOCATION ALGORITHM

The objective of this section is to demonstrate reliability allocation algorithms for designing a video broadcasting system against random failures.

## A. Multilevel Packaging Structure of a Video Broadcasting System

Video encoders compress video signals for efficient transmission in broadcasting service. Since multichannel signals are

broadcast at the same time, multiple video encoders are required. In most cases, for reliable video broadcasting service without any interruption, redundant video encoders are used. Fig. 4 illustrates six encoders for six channels and two redundant encoders for backups (4th Level). Each video encoder (3rd Level) consists of electronic board systems, and each board system (2nd Level) consists of various electronic packages (1st Level). These are illustrated in Fig. 4.

Since the video broadcasting system runs in a wellconditioned environment, random failures of the encoder systems are more dominant and important than wear-out failures. Therefore, the design of a reliable video broadcasting system against random failures is considered using the SDfR method. For example, the target reliability of the video broadcasting system is allocated from the 4th level to 2nd level. Then, the reliability of board systems is predicted. Depending on the difference between the allocated target reliability and the assessed reliability, the various design changes are considered to satisfy the target reliability of the video broadcasting system illustrated in Fig. 4.

# B. Target Reliability Allocation of a Video Broadcasting System

#### 1) RELATIVE TARGET RELIABILITY WEIGHTS (RTRWS):

The target reliability allocation algorithm calculates the target reliability of subsystems ( $R_{i,j,\text{target}}(t)$ ) from the given target reliability of their parent system ( $R_{i,\text{target}}(t)$ ). Therefore, the relation between subsystem reliability and system reliability plays a key role in target reliability allocation. This relation is derived from the series structure reliability expressed in (1).

The logarithm of both sides of (1) leads to (2). Dividing both sides by the left-hand side leads to (3). The ratio of the logarithm of subsystem reliability to the logarithm of system reliability is defined as reliability weight,  $w_{i,j}$ , as shown in (4). Thus (3) can be rewritten as (5). A relatively high weight indicates that the associated subsystem has low reliability compared with other subsystems in the parent system.

From (4), target reliability of any subsystem  $(R_{i,j,\text{target}}(t))$  can be calculated using the given target reliability of the system  $(R_{i,\text{target}}(t))$  and the estimated reliability weight of that subsystem. The estimated reliability weight is called relative target reliability weight (RTRW) in this work because the weight is only valid in a given system and its subsystems for target reliability allocation.

$$R_{i}(t) = \prod_{j=1}^{n} R_{i,j}(t)$$
 (1)

$$ln(R_i(t)) = \sum_{i=1}^{n} ln(R_{i,j}(t))$$
(2)

$$1 = \sum_{i=1}^{n} \frac{\ln(R_{i,j}(t))}{\ln(R_i(t))}$$
 (3)

$$w_{i,j} = \frac{\ln(R_{i,j}(t))}{\ln(R_i(t))} \tag{4}$$

$$1 = \sum_{i=1}^{n} w_{i,j} \tag{5}$$

where *i* refers to systems, *i.j*, to subsystems in system *i*, and  $w_{i,j}$ , to the reliability weight of subsystem *i.j* within system *i*.

One way of determining the RTRWs may be to use historical reliability data of subsystems (see (6)) [21].

$$RTRW_{i,j}^{h} \equiv \frac{\ln(R_{i,j}^{h}(t))}{\ln\left(\prod_{i=1}^{n} R_{i,j}^{h}(t)\right)}$$
(6)

where  $RTRW_{i,j}^h$  is the historical data-based relative target reliability weight,  $R_{i,j}^h(t)$  is the historical reliability data of the subsystem i,j, and  $\ln()$  the natural logarithm function.

Because these data are not always available, another way may be to use preliminary design information. In this work, we identified two factors to help determine RTRWs. The first factor is the expected number of components in the subsystems. The more components a subsystem includes, the less reliable the subsystem typically is. The second factor is the complexity sum of critical components in the subsystems. The complexity value of a component is assigned from 0-10, with 10 being the largest complexity. Therefore, the larger the complexity sum is, the less reliable the associated subsystem is. We define RTRW in mathematical form using variations of these two factors as shown in (7).

$$RTRW_{i,j}^{d} = w_{a} \frac{ENC_{i,j}}{TENC_{i}} + w_{b} \frac{ECSCC_{i,j}}{TECSCC_{i}}$$
$$= w_{a}RENC_{i,j} + w_{b}RECSCC_{i,j}$$
(7)

where  $RTRW_{i,j}^d$  is the design information-based relative target reliability weight,  $ENC_{i,j}$  is the expected number of components in subsystem i,j,  $TENC_i$  is the total expected number of components in system  $i, ECSCC_{i,j}$  is the expected complexity sum of critical components in subsystem i,j,  $TECSCC_i$  is the total expected complexity sum of critical components in system  $i, RENC_{i,j}$  is the ratio of  $ENC_{i,j}$  to  $TENC_i$ ,  $RECSCC_{i,j}$  is the ratio of  $ECSCC_{i,j}$  to  $TECSCC_i$ ,  $w_a$  is the weight of ENC factor for all  $RTRW_{i,j}$  and  $w_b$  is the weight of the ECSCC factor for all  $RTRW_{i,j}$ , and  $w_a + w_b = 1$ .

When a system consists of identical subsystems, the uniform target reliability weight can be applicable without any information. In this case, target reliability is allocated based on the number of subsystems. This is mathematically expressed in (8) [21].

$$RTRW_{i,j}^{u} \equiv \frac{1}{n} \tag{8}$$

where  $RTRW_{i,j}^u$  is the uniform relative target reliability weight, and n is the total number of subsystems.

The reliability allocation algorithm starts from setting the target reliability of the top level system. The second step is to set RTRWs among subsystems in the same subsystem level. The third step is calculating the target reliability of subsystems from the target reliability of the system and the subsystem RTRWs. The allocated subsystem target reliability is then subsequently used to determine the target reliability of its subsystems, repeating the second and the third steps from the top level system until leaf subsystems are reached, which include only components.

# 2) TARGET RELIABILITY ALLOCATION FOR PARALLEL STRUCTURE:

The broadcasting system in Fig. 4 consists of a 6-out-of-8 parallel structure with identical encoders, which means that more than six encoders should work out of eight encoders. While the calculation of a series structure for random failures is

straightforward in (9), the calculation of a parallel structure for random failures is not as shown in (10). Therefore, we develop a new numerical method of allocating random failure reliability for parallel structures.

$$e^{-\lambda_i t} = \prod_{j=1}^n e^{-\lambda_{i,j} y} \tag{9}$$

$$e^{-\lambda_i t} \approx \sum_{m=k}^n \binom{n}{m} (e^{-\lambda_{i,j} t})^m (1 - e^{-\lambda_{i,j} t})^{n-m}, \quad (0 \le t \le T_w)$$
(10)

where  $\lambda_i$  is the system target random failure rate,  $\lambda_{i,j}$  is the subsystem target random failure rate, n is the number of subsystems in parallel structure, and k is the number of required active subsystems in parallel structure.

This method consists of six steps, as shown in Fig. 5.

STEP 1: Set system target reliability and subsystem RTRWs We set the target reliability of the broadcasting system for random failures as follows:

 $\lambda_{\text{target,video broadcasting system}} =$ 

The value of 5000 FIT target random failure rate means that approximately five failures may occur out of 100 identical broadcasting systems in a year.

For subsystem reliability weight, because eight encoders are identical, we use uniform RTRW in (8) as follows:

$$RTRW_{i,j,encoder} = 1/8$$

STEP 2: Estimate the initial random failure rate of subsystem With the target random failure rate of the system and the RTRWs of subsystem, we can estimate the initial random failure rate of the subsystem using the series structure equation, shown in equation (11).

$$e^{-\lambda_{i,j}^{1 \text{st}} t} = (e^{-\lambda_{i}t})^{\frac{1}{n}}$$
 (11)

From (11), the initial random failure rate of an encoder is calculated as follows:

$$\lambda_{\text{target.encoder}} = 625 \text{ FIT}$$

STEP 3: Estimate the random failure rate of the system ( $\lambda_i^{1\text{st}}$ ) Because all the random failure rates of the subsystems are estimated from (11), the random failure rate of the system is estimated with the parallel structure (10). The random failure rate ( $\lambda_i^{1\text{st}}$ ) of the system is calculated through the least-squares method in a given time period ( $0 \le t \le T_w$ ).

For the video broadcasting system, the first estimated random failure rate of the system is as follows:

$$\lambda_{target,video\ broadcasting\ system}^{1st} = 14.7\ FIT$$

STEP 4: Calculate the difference between the given target random failure rate of the system and the estimated random failure rate of the system

According to the difference between the given target random failure rate of the system  $(\lambda_i^G)$  and the estimated random failure rate of the system  $(\lambda_i^{1\text{st}})$ , the estimated random failure rate of the



Fig. 5. Flow chart of the reliability allocation algorithm for parallel structure.

subsystem  $(\lambda_{i,j}^{1\text{st}})$  is evaluated. Equation (12) shows the random failure rate difference  $(\Delta \lambda_i^{1\text{st}})$ , and (13) shows the normalized error ratio of random failure rate  $(ER_{\lambda}^{1\text{st}})$ .

$$\Delta \lambda_i^{1\text{st}} = \lambda_i^G - \lambda_i^{1\text{st}} \tag{12}$$

where  $\Delta \lambda_i^{1\text{st}}$  is the random failure rate difference of the system,  $\lambda_i^G$  is the given random failure rate of the system, and  $\lambda_i^{1\text{st}}$  is the estimated random failure rate of the system.

$$ER_{\lambda}^{1\text{st}} = \left| \frac{\lambda_i^G - \lambda_i^{1\text{st}}}{\lambda_i^G} \right| \tag{13}$$

where  $ER_{\lambda}^{1\text{st}}$  is the normalized error ratio of random failure rate and  $\lambda_i^G$  and  $\lambda_i^{1\text{st}}$  are as described above.

For the video broadcasting system, the normalized error ratio is as follows:

$$ER_{\lambda}^{1\text{st}} = |(5000 - 14.7)/5000| = 0.9971$$

STEP 5: Estimate the second time-guess random failure rate of the subsystem  $(\lambda_{i,j}^{2nd})$ If the error ratio is larger than the given tolerance  $(ER_{\lambda}^{1st})$ 

If the error ratio is larger than the given tolerance  $(ER_{\lambda}^{1st} > T_{\lambda}^{G})$ , the estimation of the next random failure rate of the subsystem  $(\lambda_{i,j}^{2nd})$  out of the previous subsystem random failure rate  $(\lambda_{i,j}^{1st})$  is necessary. The issue is how to find the relationship between the two random failure rates of the subsystem. Since the random failure rate difference of the system  $(\Delta \lambda_{i,j}^{1st})$  is proportionally related to the expected random failure rate difference of the subsystem  $(\Delta \lambda_{i,j}^{expected} = \lambda_{i,j}^{2nd} - \lambda_{i,j}^{1st})$ , the random failure rate difference of the system divided by the number of subsystems  $(\lambda_{i,j}^{1st}/n)$  is used in guessing the next random failure rate of the subsystem  $(\lambda_{i,j}^{2nd})$ . The final relationship is shown in (14):

$$\lambda_{i,j}^{2\text{nd}} = \lambda_{i,j}^{\text{st}} + \frac{\Delta \lambda_i^{1\text{st}}}{n}$$
 (14)

where  $\lambda_{i,j}^{2\mathrm{nd}}$  is the second time-guess random failure rate of the subsystem,  $\lambda_{i,j}^{1\mathrm{st}}$  is the initial random failure rate of the sub-

Table I

Allocated Random Failure Rates of Encoder Systems with Various Parallel Structures

| Allocated random failure rate (FIT) 625.0 2789.6 5782.4 9553.8 45517.0 | Parallel structure Allocated random failure rate (FIT) | 8-out-of-8<br>625.0 | 7-out-of-8<br>2789.6 | 6-out-of-8<br>5782.4 | 5-out-of-8<br>9553.8 | 1-out-of-8<br>45517.0 |
|------------------------------------------------------------------------|--------------------------------------------------------|---------------------|----------------------|----------------------|----------------------|-----------------------|
|------------------------------------------------------------------------|--------------------------------------------------------|---------------------|----------------------|----------------------|----------------------|-----------------------|

Table II

RTRWs for Subsystems of the Encoder System<sup>a,b</sup>

| Subsystem name, i.j  | $RENC_{i,j}$       | $RECSCC_{i,j}$                                                        | $RTRW_{i,j}$                                     |
|----------------------|--------------------|-----------------------------------------------------------------------|--------------------------------------------------|
| Interface board, i.1 | 1000/2800 = 0.3571 | 20/165 = 0.1212 $60/165 = 0.3636$ $65/165 = 0.3939$ $20/165 = 0.1212$ | $0.5 \times 0.3571 + 0.5 \times 0.1212 = 0.2392$ |
| Processor board, i.2 | 1100/2800 = 0.3929 |                                                                       | $0.5 \times 0.3929 + 0.5 \times 0.3636 = 0.3782$ |
| PMC board, i.3       | 500/2800 = 0.1786  |                                                                       | $0.5 \times 0.1786 + 0.5 \times 0.3939 = 0.2863$ |
| Audio board, i.4     | 200/2800 = 0.0714  |                                                                       | $0.5 \times 0.0714 + 0.5 \times 0.1212 = 0.0963$ |

 $<sup>^{</sup>a}TENC_{i} = 2800 (= 1000 + 1100 + 500 + 200).$ 

system, and  $\Delta \lambda_i^{1\text{st}}$  is the random failure rate difference of the system.

For the video broadcasting system, the second time-guess random failure rate of the subsystem is as follows:

$$\lambda_{target,encoder}^{2nd} = 625 + 623.2 = 1248.2 \text{ FIT}$$

STEP 6: Repeat Steps 3, 4, and 5 until the error ratio is smaller than the given tolerance  $(ER_{\lambda}^{nth} < T_{\lambda}^{G})$ 

Steps 3, 4, and 5 are repeated until the error ratio is smaller than the given tolerance  $(ER_{\lambda}^{nth} < T_{\lambda}^{0})$ . For the video broadcasting system, we set the tolerance to be 0.001.

STEP 7: Calculate the allocated target random failure rates of the subsystems  $(\lambda_{i,j}^{nth})$ 

If the error ratio is smaller than the given tolerance  $(ER_{\lambda}^{nth} < T_{\lambda}^{G})$ , then find the allocated target random failure rates of the subsystems.

The allocated target random failure rate of each encoder is as follows:

$$\lambda_{target,encoder}^{nth} = 5782.4 \text{ FIT}$$

To check the validity of this value, we compare the allocated target random failure rates of encoders with different parallel structures. These are shown in Table I. According to the results in Table I, the allocated target random failure rate of the encoder with 6-out-of-8 parallel structure is between those of the encoders with 5-out-of-8 and 7-out-of-8 parallel structures, and the values are gradually increasing as expected.

### 3) Reliability Allocation for Series Structure:

Because the target reliability of each encoder system is allocated from the video broadcasting system, the next step is allocating the target reliability of the encoder system to its series subsystems: an interface board, a processor board, a PMC board, and an audio board (Fig. 4). First, setting the RTRWs of the four electronic board systems is necessary. For example, Table II shows RTRW calculation results for the board systems. We count the approximate number of components in each board and calculate  $RENC_{i,j}$  assuming limited preliminary design information. For the  $RECSCC_{i,j}$  calculation, we estimate the complexity of BGA chip packages to be 10 and that of a large I/O (>25) chip

package to be 5. For example, because the processor board includes six BGA chip packages, the expected complexity sum is 60. After the calculation of  $RENC_{i,j}$  and  $RECSCC_{i,j}$ , we estimate  $w_a$  and  $w_b$  to be equal (i.e.,  $w_a = w_b = 0.5$ ), and then calculate  $RTRW_{i,j}$ . A high value for  $RTRW_{i,j}$  indicates a comparatively less reliable subsystem. Therefore, a comparatively large target random failure rate is expected.

Per the *RTRW* values shown in Table II, the target reliability of the encoder system can be allocated to its series subsystems. For example, the target random failure of the PMC board is calculated by multiplying the target random failure of the encoder system and its RTRW ( $e^{-\lambda_{i3} \cdot t} = e^{-\lambda_{i'} t \cdot RTRW_{i,3}}$ ). The complete results are shown in Fig. 6.

### C. Reliability Prediction and Assessment for PMC Board Design

For the random failure reliability prediction, the first step is to set component feature and usage condition information. The second step is to select a statistical data-based prediction model, and the third step is to calculate a random failure rate. The last step is to construct an exponential reliability function as shown in Fig. 7.

Following the procedure, the random failure rate of the PMC board is predicted under "ground benign" condition and 100% duty operation ratio using a statistical data-based reliability prediction model. If available, in-house reiability data should be used for this purpose. If not, current available statistical data-based reliability prediction models are MIL-HDBK-217, Telcordia SR-332, CNET RDF-2000, British Telecom, and Siemens SN29500 [7]. Because MIL-HDBK-217 data are easily accessible, we use them for explaining our SDfR method. The reliability prediction results are summarized in Table III. These prediction results are for demonstration of the algorithms and should be evaluated and updated with field failure analyses.

Because the assessed random failure rate of the PMC board is larger than the allocated target random failure rate ( $\lambda_{assessed, PMC board} = 3219.5 \text{ FIT} > \lambda_{target, PMC board} = 1655.5 \text{ FIT}$ ), design changes are required. In theory, the best way is by adding redundant components in the PMC board. However, in this case adding many redundant components makes the PMC

 $<sup>^{\</sup>text{b}}TECSCC_i = 165 \ (= 20 + 60 + 65 + 20).$ 



Fig. 6. Target random failure reliability allocation to the subsystems of the encoder system.



Fig. 7. Procedure for predicting random failure reliability in the PWBA domain.

board design complex. Therefore, the design change in one higher level will be better practically. If we use dual PMC boards, the allocated target reliability is satisfied as follows:

$$\lambda_{assessed,dual\ PMC\ boards} = 282.2\ FIT < \lambda_{target,PMC\ board} = 1655.5\ FIT$$

Another possible design change may be adding more redundant encoders. However, this design change may cost more than the design change of adding a redundant PMC board.

### RELIABILITY ASSESSMENT ALGORITHM

The objective of this section is to demonstrate reliability assessment algorithms for designing a USB board system against wear-out failures.

## A. Multiple Logical Groups of a USB Board System

Universal serial bus (USB) has long been the standard for PC peripheral connectivity, but recently it has started venturing into the automotive market for supporting MP3-supported stereos, DVD players, and advanced GPS navigation systems as parts of infotainment automotive electronics [22]. In general, USB board systems are designed for use under well-conditioned environments. However, using USB board systems for harsh environments (Table IV) requires consideration of wear-out failures with physics-of-failure models.

Table III
Random Failure Reliability Prediction of PMC Board Components

| Component           | Quantity | Random failure rate (FIT) |
|---------------------|----------|---------------------------|
| Resistors           | 114      | 532.8                     |
| Capacitors          | 337      | 1058.5                    |
| Inductors           | 4        | 0.1                       |
| Diodes              | 1        | 3.3                       |
| Transistors         | 5        | 685.7                     |
| Chip packages       | 19       | 674.5                     |
| Miscellaneous items | 40       | 264.6                     |
| Total               | 520      | 3219.5                    |

A USB board system consists of multiple logical groups and their components, illustrated in Fig. 8 and summarized in Table V. Because the ports group is considered as a parallel structure, it makes the analysis of the USB board failures complex.

The USB board may fail by wear-out failures of any interconnection component. Furthermore, each interconnection component may fail by its own dominant failure mechanisms. For example, plated-through holes (PTH) may fail owing to cyclic normal stress caused by coefficient of thermal expansion (CTE) mismatch between PTHs and the board. The solder joints of ceramic chip packages may fail owing to cyclic shear stress caused by CTE mismatch between ceramic chip packages and the board.





Fig. 8. USB board system and its logical groups.

Table IV Summary Of Environmental Loadings of USB Board Systems Inside an Automobile

| Ground mobile |
|---------------|
| 25%           |
| 5-90°C        |
| Once per day  |
|               |

After analysis of failure modes and mechanisms of each component, the reliability of each component is predicted based on physics-of-failure knowledge. The reliability of the USB board system is then assessed following the logical structure of groups. Depending on the difference between the allocated target reliability and the assessed reliability, the design changes are recommended.

# B. Reliability Prediction of Components in the USB Board System

For the wear-out failures of solder joints and PTHs in the USB board, a thermomechanical solder joint fatigue model and a PTH fatigue model are developed. Solder joints are interconnection components that connect a commercial component and a printed wiring board (PWB). Failure of solder joints is caused by the different thermal expansion between the component and the PWB. The prediction of thermomechanical solder joint fatigue failure requires a finite element analysis (FEA) model to predict cyclic strains and a fatigue model to predict the number of cycles to 50% failure.

The first step of the development of the FEA model for solder joints is to identify the design features of solder joints. The design features of solder joints are the solder joint standoff height, the fillet height, the base length, and the material (see Fig. 9).

The developed FEA model is a half-symmetric plane-strain model. This model uses the PLANE82 element type of ANSYS and the isotropic material model for component modeling, the PLANE82 element type of ANSYS and the orthotropic material model for PWB modeling, and the VISCO108 element type of ANSYS and Anand's viscoplastic material model for solder joint modeling.

The test results obtained by this FEA model, total shear strain

Table V Summary of Logical Groups in the USB Board System<sup>a</sup>

| Logical group     | No. of components | Logical group    | No. of components |
|-------------------|-------------------|------------------|-------------------|
| Control group     | 120               | Connection group | 44                |
| Port 1            | 34                | Port 2           | 34                |
| Port 3            | 34                | Port 4           | 34                |
| Total number of c | ٥.                | Port 4           | 34                |

<sup>&</sup>lt;sup>a</sup>Some trivial components are not counted in this table.

and the Von Mises stress, at a cyclic temperature change between 20 and 100°C, are illustrated in Fig. 10. The boxes at the corner of the solder joint indicate the area where a crack originated.

The averaged total shear strain change in the box along with the cyclic temperature change is a damage metric for prediction of the number of cycles to 50% failure ( $N_{50}$ ). For this  $N_{50}$ , Engelmaiers's modified Coffin-Manson equation for solder [10] is used, shown in (15). The predicted test results are also shown in Table VI. Finally, for the validation of this FEA model, the results of Lau et al. [23] are compared, shown in Table VII. A good correlation is observed from both the results as shown in Table VII.

$$N_{50} = 0.5 \left[ \frac{\Delta \gamma_{\rm t}}{2\varepsilon_{\rm f}'} \right]^{1/c_{\rm fde}} \tag{15}$$

where  $N_{50}$  is the number of cycles to 50% failure,  $\Delta \gamma_{\rm t}$  is the averaged total strain change,  $\varepsilon_{\rm f}'$  is the fatigue ductility coefficient, and  $c_{\rm fde}$  is the fatigue ductility exponent.

Plated-through holes (PTHs) are interconnection components that connect copper traces in different layers. Failure of PTHs is caused by different thermal expansions between the PTHs and the PWBs. The prediction of thermomechanical PTH failure requires a FEA model for the prediction of PTH cyclic strains and a fatigue model for the prediction of the number of cycles to 50% failure.

The first step in the development of the FEA model of the thermomechanical PTH fatigue is to identify the design features of PTHs. The design features of PTHs are pad diameter, hole



#### where

 $H_{sj,st}$ : Solder joint standoff height,  $H_{sj,f}$ : Solder joint fillet height,  $L_{si,b}$ : Solder joint base length,

L<sub>c</sub>: Component length, and Th<sub>pwb</sub>: PWB thickness.

H<sub>c</sub>: Component height,

Fig. 9. Design features of solder joints:  $H_{sj,st}$ , solder joint standoff height;  $H_{sj}$ , solder joint fillet height;  $L_{sj}$ , solder joint base length;  $H_e$ , component height;  $L_e$ , component length; and  $Th_{pwb}$ , PWB thickness.





#### (a) Total shear strain

Fig. 10. FEA result of thermomechanical solder joint fatigue.

Table VI
Test Results of the Thermomechanical Solder Joint Fatigue Model

| Temperature range (°C) Strain $N_{50}$ | 20-90  | 20-100 | 20-120 |
|----------------------------------------|--------|--------|--------|
|                                        | 0.0065 | 0.0079 | 0.0110 |
|                                        | 13177  | 8516   | 4159   |
| 1V <sub>50</sub>                       | 131//  | 6510   | 4139   |

diameter, pad thickness, plating thickness, and material. These features are illustrated in Fig. 11.

The developed FEA model is an axisymmetric model. This model uses the PLANE42 element type of ANSYS and the multilinear kinematic hardening material model for PTH modeling, and the PLANE82 element type of ANSYS and the orthotropic material model for PWB modeling.

The test results obtained by this FEA model, the total shear strain and the Von Mises stress, at a cyclic temperature change between 20 and 100°C are illustrated in Fig. 12. The boxes at the center of the PTH indicate the area where a crack originated.

The average total strain change in the box along the cyclic temperature change is a damage metric for the prediction of the number of cycles to 50% failure ( $N_{50}$ ). For this  $N_{50}$ , Engelmaiers's modified Coffin-Manson equation for copper [24] is used, shown in (16). The predicted test results are also shown in Table

(b) Von Mises Stress

Table VII
Comparison for the Validation of the FEA Model of Solder Joint Fatigue

|                                | Results of<br>Lau et al. | Results of the FEA model |
|--------------------------------|--------------------------|--------------------------|
| Strain change (from -55-125°C) | 0.0143                   | 0.0162                   |

VIII. The strain for the 20-90°C temperature change (0.0012) is relatively small because deformation occurs in the elastic region and below  $T_{\rm g}$  (110°C), which causes significant thermal expansion of the PWB (FR4). Finally, for the validation of this FEA model, the results of the IPC-TP-510 analytical model [25] are compared, as shown in Table IX. The strain results are in good agreement with the IPC model shown in Table IX.

$$\Delta \varepsilon_{\rm t} = N_{50}^{-0.6} (\varepsilon_{\rm f}')^{0.75} + \frac{0.9\sigma_{\rm u}}{E} \left[ \frac{e^{\varepsilon_{\rm f}'}}{0.36} \right]^{0.1786\log\left(\frac{10^5}{N_{50}}\right)}$$
(16)

where  $N_{50}$  is the number of cycles to 50% failure,  $\Delta \varepsilon_{\rm t}$  is the averaged total strain change,  $\varepsilon_{\rm f}'$  is the fatigue ductility coefficient,  $\sigma_{\rm u}$  is the ultimate tensile strength, and E is Young's modulus



### Where

D<sub>pth.pad</sub>: Pad diameter, D<sub>pth.h</sub>: Hole diameter, Th<sub>pth.pad</sub>: Pad thickness, Th<sub>pth.p</sub>: Plating thickness, and Th<sub>pwb</sub>: PWB thickness.

 $Fig.~11.~~Design~features~of~PTHs:~D_{pth.pad},~pad~diameter;~D_{pth.h},~hole~diameter;~Th_{pth.pad},~pad~thickness;~Th_{pth.p},~plating~thickness;~and~Th_{pwb},~PWB~thickness.$ 





# (a) Total strain

# (b) Von Mises Stress

Fig. 12. FEA result of thermomechanical PTH fatigue: (a) total strain; (b) von Mises stress.

Table VIII
Test Results of Thermomechanical PTH Fatigue Model

| Strain | 0.0012 0. | .0039 | 20-120<br>0.0120<br>22 |
|--------|-----------|-------|------------------------|
|--------|-----------|-------|------------------------|

In this USB board system, five types of design features are identified (see Fig. 13). These features are repeated in all designed components. Detail parameters of these features are shown in Table X.

Once the parameters of each design feature were applied to the solder joint fatigue model and the PTH fatigue model in the previous section, the wear-out failure reliability of each interconnection feature was predicted (see Table XI). The shape pa-

Table IX
Comparison for the Validation of the FEA Model of PTH Fatigue

|                               | Result of the IPC model | Result of the FEA model |
|-------------------------------|-------------------------|-------------------------|
| Strain change (from 20-100°C) | 0.0024                  | 0.0028                  |

rameters of the Weibull curve ( $\beta$  of  $R(t) = e^{-(t/\alpha)^{\beta}}$ ) are assumed from the experimental data. For the thermomechanical fatigue failure of solder joints, the shape parameter of the Weibull curve is 2 [26], and for the thermomechanical fatigue failure of PTHs, the shape parameter of the Weibull curve is 5 [27], [28]. With these shape parameters and  $N_{50,\text{hours}}$  the characteristic lifetime parameters of the Weibull curve ( $\alpha$  of  $R(t) = e^{-(t/\alpha)^{\beta}}$ ) can be calculated from (17).



Fig. 13. Five design features in the USB board system.

Table X Values of Design Features in the USB Board System

|                         |                    | Design feature (in mm) |                     |                    |        |  |
|-------------------------|--------------------|------------------------|---------------------|--------------------|--------|--|
|                         | A                  | В                      | С                   | D                  | Е      |  |
| Solder joint            |                    |                        |                     |                    |        |  |
| $H_{sj.st}$             | 0.1270             | 0.1270                 | 0.2000              | 0.2000             |        |  |
| $H_{sj\cdot f}^{3j.3t}$ | 0.4000             | 0.4000                 | 0.7000              | 0.5000             |        |  |
| $L_{sj\cdot b}$         | 0.3000             | 0.3000                 | 0.3000              | 0.3000             |        |  |
| H <sub>c</sub>          | 0.5000             | 0.7000                 | 0.9000              | 0.7000             |        |  |
| $L_c$                   | 2.0000             | 2.0000                 | 2.0000              | 2.0000             |        |  |
| Solder material         | Sn(96.5)-Ag(3.5)   | Sn(96.5)-Ag(3.5)       | Sn(96.5)-Ag(3.5)    | Sn(96.5)-Ag(3.5)   |        |  |
| Component material      | Alumina (resistor) | Alumina (capacitor)    | Alumina (capacitor) | Alumina (inductor) |        |  |
| PTH                     | ` ′                |                        | /                   | `                  |        |  |
| $D_{\mathrm{PTH.pad}}$  |                    |                        |                     |                    | 0.5080 |  |
| D <sub>PTH.h</sub>      |                    |                        |                     |                    | 0.3429 |  |
| Th <sub>PTH.pad</sub>   |                    |                        |                     |                    | 0.0483 |  |
| Th <sub>PTH.p</sub>     |                    |                        |                     |                    | 0.0127 |  |
| PTH material            |                    |                        |                     |                    | Copper |  |
| PWB                     |                    |                        |                     |                    | **     |  |
| $Th_b$                  | 2.0000             | 2.0000                 | 2.0000              | 2.0000             | 2.0000 |  |
| PWB material            | FR4                | FR4                    | FR4                 | FR4                | FR4    |  |

The Weibull functions of each feature are illustrated in Fig. 14. The comparison of the reliability of different features is important for the system reliability improvement as well as total system reliability assessment. According to Fig. 14, features A and B are less reliable than other features (features C, D, and E). If the design changes in the USB board are necessary because of unsatisfied wear-out failure reliability, then a reasonable approach would be to change the parameters of features A and B.

$$\alpha = \frac{N_{50,\text{hours}}}{\{-\ln(0.5)\}^{\frac{1}{\beta}}}$$
(17)

These simulation models and results are based on the assumption of no problems in quality, and they are meant for designers' use to determine design parameters, not for quality engineers. Therefore, quality issues are resolved independently. In addition, endurance testing should be executed to resolve quality and reliability issues after prototype development. The test results are used in improving and validating simulation models and results.

Table XI
Prediction Results for the Wear-out Failure Reliability of the Design Features

| Design feature | Strain                | $N_{50}^{\ \ a}$     | $N_{50, \mathrm{hours}}^{}}$ |
|----------------|-----------------------|----------------------|------------------------------|
| A              | $4.98 \times 10^{-3}$ | $2.5072 \times 10^4$ | $6.0173 \times 10^{5}$       |
| В              | $4.98 \times 10^{-3}$ | $2.5072 \times 10^4$ | $6.0173 \times 10^{5}$       |
| C              | $3.78 \times 10^{-3}$ | $4.6252 \times 10^4$ | $1.1100 \times 10^6$         |
| D              | $3.41 \times 10^{-3}$ | $5.8067 \times 10^4$ | $1.3936 \times 10^{6}$       |
| E              | $2.03 \times 10^{-3}$ | $6.0229 \times 10^4$ | $1.4455 \times 10^6$         |

 $<sup>^{</sup>a}N_{50}$  is the number of cycles to 50% failure.

### C. Reliability Assessment and Design Change Recommendation of the USB Board System

The reliability data predicted from the models presented in the previous sections are used for assessing the reliability of the USB board considering the logical groups. Because the UBS

 $<sup>{}^{\</sup>rm b}N_{\rm 50,hours}$  is equal to  $N_{\rm 50}$  × frequency × 24 h.



Fig. 14. Wear-out failure reliability of the design features in the USB board system.

board system consists of the four parallel ports, the failure of the USB board system may be interpreted differently depending on the failures of the four parallel ports. For example, if the complete functionality of the USB board system is required, then success of all four ports should be expected. However, if the minimum utility of the USB board system is required, then the success of at least one port will be fine. The former case that requires the success of all ports is called the minimum reliability of the system, and the latter case that requires the success of at least one port is called maximum reliability of the system.

Before the reliability assessment of the USB board, the reliability of each logical group is assessed. Then, maximum and minimum reliabilities are assessed following the logical structure of groups. For the maximum reliability (18) of the USB board, a 1-out-of-4 parallel relation among four ports is used, while a series relation is used for the minimum reliability (19).

$$\begin{split} R_{\text{USB,Max}}(t) &= R_{\text{control}}(t) \\ &\times \left\{ 1 - \prod_{i=1}^{4} \left( 1 - R_{i,\text{port}}(t) \right) \right\}_{\text{four parallel ports}} \\ &\times R_{\text{connection}}(t) \end{aligned} \tag{18}$$
 
$$R_{\text{USB,Min}}(t) = R_{\text{control}}(t) \times \left\{ \prod_{i=1}^{4} R_{i,\text{port}}(t) \right\}_{\text{four series ports}} \\ &\times R_{\text{connection}}(t) \tag{19}$$

The results are shown in Fig. 15. The maximum system reliability of the electronic board assembly is  $R_{\rm w}(T_{\rm w})_{\rm assessed} = 0.8621$  at  $T_{\rm w} = 60,000$  h, and the minimum system reliability of the electronic board assembly is  $R_{\rm w}(T_{\rm w})_{\rm assessed} = 0.6735$  at  $T_{\rm w} = 60,000$  h.

Suppose the target reliability for the USB board system is allocated from an automotive electrical system in a way similar to the allocation for video broad casting system in Section IV. Because the minimum assessed wear-out failure reliability of the USB board is less than the allocated target wear-out failure reliability [Min.  $R_{\rm w}(T_{\rm w})_{\rm assessed} = 0.6735 < R_{\rm w}(T_{\rm w})_{\rm target} = 0.67$ 



Fig. 15. Results of assessing the wear-out failure reliability of the USB board system.

0.7000], design changes are required. If we increase the standoff heights of design features A and B from 0.1270 mm to 0.2000 mm, then the allocated target reliability is satisfied as follows:

Min. 
$$R_{\rm w}(T_{\rm w})_{\rm assessed} = 0.9049 > R_{\rm w}(T_{\rm w})_{\rm target} = 0.7000$$

Such design parameter determination is not possible without allocated target reliability and quantitative reliability assessment. Therefore, the SDfR method is useful to determine design parameters as well as system configurations.

#### SUMMARY AND CONCLUSIONS

Key aspects of the SDfR framework—reliability allocation, prediction and assessment, and design recommendations—are demonstrated in this paper through two different case studies of microelectronic systems. One benefit of the SDfR method is that it systematically evaluates all possible cases of system failures and suggests various design changes over multilevel packaging structures, from design features at the subsystem level to redundant subsystems at the top-system level.

The SDfR method is implemented by reliability metrics and reliability algorithms based on both statistical and physics-of-failure knowledge. These are demonstrated by designing a video broadcasting system against random failures and a USB board system against wear-out failures. The presented examples show that the algorithms are useful for determining system configurations and design parameters. Such design changes will reduce the burden of downstream reliability activities. It is to be noted here that the examples presented in this paper are meant to demonstrate the developed reliability allocation and assessment algorithms and need verification with reliability data generated by practical experiments on specific products.

In the presented case studies, only critical failures are considered. However, in reality, system reliability is more complex than the examples considered in this paper. For example, interactions that simultaneously involve many components—such as the board warpage simulation, which represents the interaction between electronic components and a printed wiring board—are not considered here. However, the SDfR method can accommodate such failure interactions by using more failure prediction models. The presented algorithms in the context of SDfR framework is a consistent and unified top-down and bottom-up approach that is allocating, predicting, and assessing reliability and recommending design changes at the component and system level. Because these reliability activities lead to early design decisions for reliability at the subsystem level, this method has advantages in time and cost savings for complex electronic system design for reliability.

#### ACKNOWLEDGMENTS

The authors thank the following people at Georgia Tech for their helpful comments and cooperation: Jamie Ahmad, Manas Bajaj, Shashikant Hegde, Karan Kacker, Kevin Klein, Kang Joon Lee, Andrew Perkins, Krishna Tunga, and Jiantao Zheng.

#### REFERENCES

- C. Bestory, F. Marc, and H. Levi, "Statistical analysis during the reliability simulation," Microelectronics and Reliability, Vol. 47, pp. 1353-1357, 2007
- [2] D. Crowe and A. Feinberg, *Design for Reliability*. New York: CRC Press, 2001.
- [3] I. Kim, S.K. Sitaraman, and R.S. Peak, "Reliability object model: a knowledge model of system design for reliability," in Proc. ASME International Mechanical Engineering Congress and Exposition IMECE2005-79934, 2005.
- [4] The Handbook of Microelectronics and Interconnection Technology, Electrochemical Publications, 1985.
- [5] W. Denson, "The history of reliability prediction," IEEE Transactions on Reliability, Vol. 47(3), pp. 321-328, 1998.
- [6] F. Jensen, Electronic Component Reliability. New York: John Wiley & Sons, 1995.
- [7] B. Foucher, J. Boullie, B. Meslet, and D. Das, "A review of reliability prediction methods for electronic devices," Microelectronics and Reliability, Vol. 42, pp. 1155-1162, 2002.
- [8] G.H. Ebel, "Reliability physics in electronics: a historical view," IEEE Transactions on Reliability, Vol. 47, pp. 379-389, 1998.
- [9] A. Perkins, K. Tunga, and S. Sitaraman, "Acceleration factor to relate thermal cycles to power cycles for ceramic ball grid area packages," Journal of Microelectronics and Electronic Packaging, Vol. 3, pp. 177-193, 2006.

- [10] W. Engelmaier, "Fatigue life of leadless chip carrier solder joints during power cycling," IEEE Transaction on Components, Hybrids, and Manufacturing Technology, CHMT, Vol. 6, pp. 232-237, 1983.
- [11] I. Snook, J.M. Marshall, and R.M. Newman, "Physics of failure as an integrated part of design for reliability," in Proc. Reliability and Maintainability Symposium, pp. 46-54, 2003.
- [12] L.W. Condra, Reliability Improvement with Design of Experiments. New York: Marcel Dekker, 1993.
- [13] M. Rausand and A. Høyland, System Reliability Theory, Models, Statistical Methods and Applications. New York: Wiley, 2004.
- [14] I. Kim, R.V. Pucha, R.S. Peak, and S.K. Sitaraman, "System-design-forreliability tools for highly integrated electronic packaging systems," in Proc. Electronic Components and Technology Conference, 2007, pp. 1809-1814.
- [15] J. Jones, J. Marshall, G. Aulak, and B. Newman, "Development of an expert system for reliability task planning as part of the REMM methodology," in Proc. Reliability and Maintainability Symposium, pp. 423-428, 2003
- [16] REMM, Available at: http://www.remm.org. Accessed: 1 May 2007.
- [17] R.V. Pucha, S. Hedge, M. Damani, K. Tunga, A. Perkins, S. Mahalingam, G. Ramakrishna, G.C. Lo, K. Klein, J. Ahmad, and S.K. Sitaraman, "System-level reliability assessment of mixed-signal convergent microsystems," IEEE Transactions on Advanced Packaging, Vol. 27, pp. 438-452, 2004.
- [18] J. Bisschop, "Reliability methods and standards," Microelectronics and Reliability, Vol. 47, pp. 1330-1335, 2007.
- [19] P. Solomalala, J. Saiz, M. Mermet-Guyennet, A. Castellazzi, M. Ciappa, X. Chauffleur, and J.P. Fradin, "Virtual reliability assessment of integrated power switches based on multi-domain simulation approach," Microelectronics and Reliability, Vol. 47, pp. 1343-1348, 2007.
- [20] A.P. Wood, "Reliability—metric varieties and their relationships," in Proc. Reliability and Maintainability Symposium, pp. 110-115, 2001.
- [21] W.R. Blischke and D.N.P. Murthy, Reliability: Modeling, Prediction and Optimization. New York: John Wiley & Sons, Inc., 2000.
- [22] B. Ellis, "USB in automotive: no longer just for PCs." Available at: http:// www.epn-online.com/page/36489/usb-in-automotive----no-longer-justfor-pcs.html. Accessed: 1 May 2007.
- [23] J.H. Lau, D.W. Rice, and P.A. Avery, "Nonlinear analysis of surface mount solder joint fatigue," in Proc. IEEE CHMT Symposium, 1986, pp. 173-184.
- [24] W. Engelmaier, "Results of the IPC copper foil ductility round-robin study," IPC Publication No. 9471987.
- [25] IPC-TP-510, "Thermal induced strain in plated-through-holes." Evanston, IL: Institute for Interconnecting and Packaging Electronic Circuits, 1984.
- [26] N. Blattau and C. Hillman, "A comparison of the isothermal fatigue behavior of SN-AU-CU to SN-PB Solder," in Proc. SMTAI Conference, 2005.
- [27] T. Suzuki, Available at: http://www.onboard-technology.com/pdf\_aprile2005/040501.pdf. Accessed: 1 May 2007.
- [28] H. Mercado-Corujo, "Study of the thermo-mechanical reliability of a plated-through-hole/press-pin assembly," Master's Thesis in Mechanical Engineering. Atlanta, GA: Georgia Institute of Technology, 2001.