## Inverse Gaussian Distribution Based Timing Analysis of Sub-threshold CMOS Circuits

Jiaoyan Chen, Sorin Cotofana Department of Computer Engineering, TU Delft, Delft, the Netherlands

Abstract— Aggressive technology scaling and ultra low power constraints have resulted in less predictable device behavior complicating timing analysis/estimation. The traditional delay models fail to accurately capture the circuit behavior under such conditions. This paper proposes a novel highly accurate Inverse Gaussian Distribution (IGD) based delay model applicable to both combinational and sequential elements for sub-powered circuits. The IGD based delay estimation accuracy is demonstrated by evaluating multiple circuits, i.e., D Flip Flops (DFFs) + 8-bit Ripple Carry Adder, and 8-bit De-multiplexer (DEMUX) and Multiplexer (MUX). Our experiments indicate that the IGD based approach provides a high matching against HSPICE Monte Carlo simulation results, with an average error less than 1.9% and 1.2% for the two circuits, respectively, while sparing orders of magnitude simulation time. Moreover, the IGD model outperforms the traditional Gaussian Distribution (GD) model by providing 6x better average accuracy with no extra simulation time overhead.

Keywords— Timing Analysis; Near/Sub Threshold Operation; CMOS Process Variations; Delay Model, Statistical Modelling;

### I. INTRODUCTION

Successful timing analysis is the method of computing the maximum clock frequency at which the circuit can safely operate thereby guaranteeing the correct chip functionality post fabrication. Accurate yet efficient timing estimation models are desirable for the evaluation of digital IC's performance. With the continuous scaling of the transistors, the impact of process variations and voltage scaling has resulted in challenging the timing analysis. Unpredictable gate delay values induced by local variations [1] along with other variations like CMOS channel length, threshold voltage, and oxide thickness are some of the key factors that complicate timing analysis and estimation [2]. Further, with the introduction of sub-threshold logic [3], the CMOS circuits supply voltage value has been pushed to an extremely low level, which is near or below the MOSFET threshold voltage. This dramatically reduces the power consumption but aggravates the process variations there by compromising circuit functional performance [4]. Hence, accurate delay analysis and estimation is turning out to be even more challenging and the capabilities of conventional delay models and timing analysis approaches are proving to be inadequate.

Traditional approach is to perform highly accurate SPICE simulations with the downside being the inevitable long runtimes. To overcome this, corner analysis has been widely adopted and deals with multiple Process, Voltage, and Temperature (PVT) corners. However, the high sensitivity and unpredictability of deep submicron CMOS devices turns this

Satish Grandhi, Christian Spagnol, Emanuel Popovici Department of Electrical and Electronic Engineering, University College Cork, Cork, Ireland

approach to being either overly pessimistic or optimistic [5]. Several improved methodologies have been proposed to achieve better accuracy within acceptable computing time. Statistical Static Timing Analysis (SSTA) [6] was proposed to determine the distribution of propagation delays and signal timing violation on digital CMOS circuit critical paths. Nonetheless, SSTA requires burdensome efforts to automate the approach while disregarding the input pattern delay dependence. In order to overcome these obstacles, Monte Carlo Static Timing Analysis (MCSTA) [6] and Dynamic Timing Analysis (DTA) [7] were proposed. The MCSTA is the one-off generation of a Variation Cell Library for standard cells, which is used to carry out static timing analysis to create thousands of randomized gate-level net-lists. MCSTA can be considered as a trade-off between the time consuming Monte Carlo (MC) SPICE simulation and the relatively inaccurate SSTA. A statistical DTA approach that employs the normal Gaussian approximation to model the propagation delay on the basis of distinguishable input patterns was presented in [8]. While reasonably accurate, the approach can be costly in terms of processing time, as its accuracy directly depends on the number of considered input vectors.

In this paper, a comprehensive delay approximation methodology based on Inverse Gaussian Distribution (IGD) is proposed. We introduced the primitive version of this model in [9] applicable to combinational elements in digital CMOS circuits to compute key parameters of the model. The main idea behind the proposal is to first gather the basic gate key parameters by means of MC simulations and then linearly extrapolate (propagate) them through the logic network at the circuit level. In this refined approach, the effect of fan-out value and input transition time on the gate delay is also taken into consideration. Moreover, the model is extended to cover sequential components as well. The proposed IGD model is endorsed by physical phenomena and provides considerable delay estimation accuracy and efficiency. Furthermore, the model is highly accurate even for different power supply voltage values ranging from nominal  $V_{dd}$  to sub-threshold V<sub>dd</sub>. This approach is significantly faster than the state-of-theart since only the basic cells have to be fully simulated in order to obtain the key model parameters and the delay model for complex circuits. Unlike other techniques or tools, which demand large look-up tables or complicated calculations [6, 7], the proposed approach is remarkably straightforward.

To demonstrate the practicability of the IGD based statistical approach, we compared the estimated delay with the current model and MC SPICE simulations for several combinational and sequential logic blocks. 32nm technology models were employed across all the simulations. Experimental results indicate that the proposed method outperforms GD fitting and provides a very close match with MC simulations, i.e., less than 1.2% and 1.9% error for the two considered circuits. For the simulation set-up, process variations and voltage variations are investigated during this key-parameters estimation step. Temperature variation is not taken into consideration due to not only its less significance compared with the aforementioned two parameters but also the temperature inversion effect (in sub-threshold region, also known as weak-inversion region, the higher the temperature the smaller the propagation delay), thereby the room temperature is chosen for all the simulations [3].

The rest of the paper is organized as follows. In Section II, the IGD delay model is introduced and collated with other related work to demonstrate its accuracy and flexibility. Then, in Section III, IGD models and their key parameters for combinational and sequential CMOS elements are presented. After that, the IGD model expansion to capture the fan-out and input transition time, effects is explored in Section IV. Thereafter, the delay estimation approach is applied in Section V on two circuits: (i) DFFs + 8-bit Ripple Carry Adder (RCA) and (ii) DFFs + 8-bit De-multiplexer (DEMUX) and Multiplexer (MUX). Finally, conclusions and future work are discussed in Section VI.

#### II. INVERSE GAUSSIAN DELAY MODEL

In this section, comprehensive delay approximation methodology based on Inverse Gaussian Distribution (IGD) is introduced. We first introduce the IGD model and compare with other related work namely, the Gaussian Distribution (GD) delay model [8]. The proposed IGD delay model behavior is then explained with theoretical endorsement. To demonstrate the accuracy and flexibility of our approach, both conventional  $V_{dd}$  and near/sub-threshold  $V_{dd}$  values in conjunction with different types of process variations are considered.

## A. Related Work

A propagation delay estimation algorithm [8] based on GD was employed to compute delay, where a close match was found between the measured propagation delay profile and the Gaussian Probability Density Function (PDF). However, the choice for approximating the delay PDF with a normal distribution was based on fitting only two Monte Carlo simulations with no scientific explanation to support the model. Although the GD delay model matches closely under nominal voltage supply, several GD characteristics hint its inability to capture delay data distributions in the general case. First, by definition, GD is characterized by a function with the field of real numbers as its support, which indicates that it assumes non-zero value also for negative time values. This is a clear mismatch with the circuit physical reality since no signal propagation delay can be negative. Furthermore, the normal (Gaussian) distribution is symmetric around its mean value, which may not always be the case when considering gate delay data. Actually, the simulations presented in the next subsection demonstrate that the GD symmetric property doesn't hold true for the cases of interest.

## B. Inverse Gaussian Distribution Approximation

In probability theory, Inverse Gaussian Distribution also known as Wald distribution is a two parameter family of continuous probability distributions. The distribution support is  $[0, \infty]$  and it can be symmetric or asymmetric around  $\mu$ . Its probability density function IGD ( $\mu$ ,  $\lambda$ ) expressed in Eq. (1) where  $\mu$  is the mean and  $\lambda$  the shape parameter, can overcome both the shortfalls. As  $\lambda$  tends to infinity, the inverse Gaussian distribution becomes more like the Gaussian distribution.

$$f(x,\mu,\lambda) = \left[\frac{\lambda}{2\pi x^3}\right]^{1/2} \exp\left(\frac{-\lambda(x-\mu)^2}{2\mu^2 x}\right); x > 0 \quad (1)$$

There is an intuitive reason why IGD fits better than GD with CMOS delay propagation data under various PVT variations. The random movements of the carrier particles in electronic circuits in steady state can be well modeled by Brownian motion also called as Wiener process [10]. For particles under Brownian motion, GD captures the motion distribution of all particles at a given moment in time, while IGD reflects the particle motion when a drift is applied. In particular, IGD provides the number of particles in random motion with a positive drift that reach a fixed level in a given time period. In electronic devices, the drift can be seen as the voltage difference between device terminals producing an electric field thus inducing carrier movements. We note that the IGD shape can change significantly depending on its two parameters and it is not restricted like the GD approximation which has to be always symmetric. Based on this argument we can conclude that the inverse Gaussian distribution is potentially better suited than the normal distribution to represent delay distributions in electronic circuits.

#### C. Flexibility (Universality) of IGD model

SPICE simulations were performed on 2-input AND gate with threshold voltage variation (employing Gaussian distribution) being the most dominant element of all process variations [8]. 32nm Predictive Transistor Models (PTM) under a nominal supply voltage of 0.9V was employed for the Monte Carlo simulations. To compare our IGD model with the GD model, similar experiment i.e., a 2-input AND gate with inputs switching from 00 to 11, has been reproduced. The threshold voltage (V<sub>th</sub>) variation is generated following the GD, where the mean value is the nominal  $V_{th}$ ,  $V_{thn}$ =0.322V for nFETs and  $V_{thp}$ =-0.302V for pFETs, and the standard deviation is set to 50mV, which is sufficient to reflect the threshold voltage variation in state of the art circuits. Both GD and IGD are used to fit the propagation delay data profile. The resulting delay histogram and the corresponding GD and IGD fittings are presented in Fig. 1 which clearly indicates that both fit well the delay data.

However, the similarity in fitting capability no longer holds true, when the gate operates in the near threshold regime. Similar experiment is repeated for the same  $V_{th}$  distribution with a V<sub>dd</sub> of 0.3V and Fig. 2 presents the delay histogram and the GD and IGD fittings. It is clear that IGD almost perfectly fits the delay histogram which has a nonsymmetrical shape with a steep slope towards zero and a long tail towards infinity. On the other hand, unlike in Fig. 1, GD does not provide a good fitting in this case. It is also worth mentioning that the GD fitting curve does not start from zero, which is unrealistic because no circuit can operate without propagation delay. The experiment highlights the shortfall of using GD for fitting delay data.



Fig. 1. IGD vs GD approximation for 2-input AND gate @0.9V  $V_{\rm dd}$ 



Fig. 2. IGD vs GD approximation for 2-input AND gate @0.3V Vdd.



Fig. 3. IGD vs GD approximation for AND gates with Gaussian distribution on  $V_{dd}$  and  $V_{th}$  @0.9V  $V_{dd.}$ 

To further demonstrate the IGD fitting accuracy, A chain of 5 AND gates in which all AND gates, except for the first one, are fed by the output of the previous gate is considered to further demonstrate the IGD fitting accuracy. The chain has been simulated with its primary inputs switching from 11 to 00 but employing similar process variations as in the previous experiment. A standard deviation of 50mV in the power supply voltage is assumed at 0.9V V<sub>dd</sub>, which reflects real circuit power supply voltage fluctuations. In Fig. 3, the histogram and their corresponding fittings for the 3<sup>rd</sup> and 5<sup>th</sup> AND gates are presented (the other stages are omitted for clarity). One specific case of interest is to consider voltage variation alone. The current nanometer technologies present up to 20% of voltage variation, thereby changing the expected circuit performance drastically. Fig. 4 depicts the simulations results and mapping of both IGD and GD delay models wrt Monte Carlo simulation results. It proves that IGD is better compared to GD to model the delay degradation taking the impact of voltage variations alone. Thus, it is obvious that IGD fits better the experimental data than the normal GD. Based on our simulations, we can conclude that IGD can accurately capture gate and simple circuit propagation delays for different process and voltage variations.



Fig. 4. IGD vs GD fitting for 2-input AND gate with only supply variation.

# III. IGD BASED DELAY MODEL FOR COMBINATIONAL AND SEQUENTIAL CIRCUITS

In this section, we extend the IGD model for both combinational and sequential circuits. The key parameters are obtained that are utilized to estimate larger circuit probability distributions.

#### A. Typical timing path in synchronous CMOS circuits

In synchronous CMOS circuits, D-latches (DL) and D flipflops (DFFs) are employed for data synchronization. This is then passed onto the combinational logic blocks which are regarded as timing segments in static timing analysis. The cumulative sum of each delay segment caused by all these blocks determines the timing analysis typically known as Register  $\rightarrow$ Register (R2R) delay as depicted in Eq. [2].

$$D_{R2R} = D_{C2Q} + D_{LOGIC} + D_{SETUP,}$$
(2)

where,

- $D_{R2R}$  is the total delay of a timing path
- T<sub>C2Q</sub> is the delay of D flip-flop from the clock rising/falling edge to the output
- *D<sub>LOGIC</sub>* is the propagation delay through the combinational logic
- $D_{SETUP}$  is the setup time of the output registers.

Unlike in the conventional corner analysis that comprises of exact delay value for each component, our approach proposes to estimate the aforementioned IGD key parameters  $\mu$  and  $\lambda$  for each of the terms in Eq. (2). We note that in this paper, emphasis is placed on the first two terms of the equation and leave  $D_{SETUP}$  future consideration given that it is significantly smaller. The methodology of  $\mu$  and  $\lambda$ computation for the longest path using a linear combination of the parameters neglecting  $D_{SETUP}$  was already explained [9]. Eq. (2) can therefore be translated into:

 $\mu_{R2R} = \mu_{C2Q} + \mu_{LOGIC} \tag{3-1}$ 

$$\lambda_{R2R} = \lambda_{C2Q} + \lambda_{LOGIC} \tag{3-2}$$

B. Combinational Circuits - INV, NAND, NOR & XOR

The basic gates are the building blocks in any digital CMOS circuits. As the simplest gate in the CMOS family, INV is widely used in signal regulation and for enhancing signal strength. NAND & NOR gates are universal gates and it is well known that all Boolean circuits can be synthesized with either of these universal gates alone. XOR is the most commonly used gate in all error correcting circuits. The IGD and GD fittings of all these gates based on ten thousands MC simulations and fan-out (FO) value set to one are depicted in Fig. [5-8]. Both charging and discharging events at the output node are considered. The list of parameter variations employed within our simulation setup is:

- V<sub>dd</sub> mean value 0.3V and deviation 30mV
- V<sub>th</sub> mean value 0.322V for nFETs and -0.302V for pFETs, and standard deviation 50mV
- $T_{ox}$  10% deviation for both nMOS and pMOS transistors.

From the plots, it is clear that IGD correlates pretty well with that of the MC simulation results as compared to GD. The key parameters,  $\mu$  and  $\lambda$ , for INV and NAND which serve as basic values for the proposed delay model are listed in Table I. These parameters will be used in Section V for validation purpose.



Fig. 5. IGD and GD fittings for INV charging and discharging events.



Fig. 6. IGD and GD fittings for NAND charging and discharging events.



Fig. 7. IGD and GD fittings for NOR charging and discharging events.



Fig. 8. IGD and GD fittings for XOR charging and discharging events.

## C. Sequential Circuits–Master Slave DFF

Sequential elements such as DL and DFF are employed in logic circuits for data synchronization and are extremely important from timing analysis perspective. Unlike combinational circuits, which do not include any feedback loops, cross-coupled circuits are utilized for data retention in these elements. Therefore, it is important to verify if the IGD model fits well also for sequential elements as well. A DFF is composed of two adjacent DLs, known as Master and Slave, controlled by complementary clock signals. A DFF implementation built with eight NAND gates and two INVs is depicted in Fig. 9. The DFF IGD and GD fitted PDFs along with MC simulation data for FO=1 is presented in Fig. 10 where both the discharging  $(1 \rightarrow 0)$  and charging  $(0 \rightarrow 1)$  events at the DFF input (D) are considered.



Fig. 9. NAND and INV based DFF schematic.

The key parameters,  $\mu$  and  $\lambda$  under different conditions are also summarized in Table I. It can be observed that the falling transition values are greater than the ones for the rising transition, which means that the discharging event takes more time than the charging event. Again one can easily observe that IGD correlates well with MC simulation data as compared to GD.



Fig. 10. IGD and GD fittings for DFF charging and discharging events.

| 100101.0000000000000000000000000000000 | Table I : | $\mu$ and $\lambda$ | for INV. | NAND, | DFF |
|----------------------------------------|-----------|---------------------|----------|-------|-----|
|----------------------------------------|-----------|---------------------|----------|-------|-----|

| GATE | Chai                  | rging                 | Discharging           |                       |  |
|------|-----------------------|-----------------------|-----------------------|-----------------------|--|
|      | μ (e <sup>-11</sup> ) | λ (e <sup>-10</sup> ) | μ (e <sup>-11</sup> ) | λ (e <sup>-10</sup> ) |  |
| INV  | 4.8                   | 9.3                   | 5.8                   | 9.6                   |  |
| NAND | 6.2                   | 11.3                  | 7.9                   | 7.7                   |  |
| DFF  | 28.2                  | 33.4                  | 41.8                  | 47.2                  |  |

## D. Sequential Circuits-Sub Threshold DFF

For ultra-low-power applications, operating the transistors in their sub threshold region is an effective way of reducing the power dissipation of a circuit. An optimal flip-flop has low power dissipation, imposes no delay and gives a valid output at all time. In practical implementation, trade-offs between these parameters must be done. To confirm that our model works even these specific flop designs, the analysis is extended unto sub threshold specific flop design as depicted in Fig. 11. PowerPC 603 flip-flop, which was used in the PowerPC 603 microprocessor data-path [12], is a combination of the Transmission-Gate Master-Slave (TGMS) and C2MOS flip-flops, using clocked inverters instead of feedback transmission gates. The DFF IGD and GD fitted PDFs along with MC simulation data for FO=1 is presented in Fig. 12 where both the discharging  $(1 \rightarrow 0)$  and charging  $(0 \rightarrow 1)$  events at the DFF input (D) are considered. Again one can easily observe that IGD correlates well with MC simulation data as compared to GD.

Thus, the practicability of fitting the behavior of both combinational and sequential gates using the IGD model is demonstrated. The shapes of the data and IGD fitting curves for both combinational and sequential elements are not symmetric, which once more provides evidence of the GD model inappropriateness. Besides, it is also demonstrated that discharging events for all gate types takes longer than charging events. This provides a solid platform for delay estimation in a typical timing path. In the next section, the effect of fan-out and input transition time on our approach is discussed.



Fig. 11. Sub Threshold DFF circuit architecture



Fig. 12. IGD and GD fittings for Sub threshold based DFF architecture.

## IV. FAN-OUT AWARE IGD MODEL

In the previous section, a basic equation to calculate the IGD key parameters for the critical path was introduced along with several IGD fittings and the corresponding key parameters for sub-powered gates. Moreover, the linear compositionality of the IGD model for combinational circuits has been demonstrated in [9]. To complete the delay model, fan-out a crucial component has to be taken into account in order to depict more realistic scenarios. Fan-out, also regarded as the capacitive load at the output of a gate, can significantly affect output signal transition time and the propagation delay. In fact, there are two types of fan-out related phenomena that affect the gate delay:

- Fan-Out of the Current gate (FOC)
- Fan-Out of the Previous gate (FOP)

FOC has a direct impact on the gate delay. On the other hand, higher FOP would result in long signal transition time on the gate inputs which eventually results in higher propagation delay. We note that the driving ability of subpowered circuits is relatively weak and limits the maximum acceptable output load. In other words, a high fan-out is not suitable or requires careful designs in near/sub-threshold circuits. In this work, the maximum fan-out value is considered to be 4. We note that when high fan-out is needed for sequential elements, i.e., DFFs, buffers insertion technique would be employed for load distribution instead of overloading the DFFs output node. This practice justifies narrowing our study on the FO value impact on combinatorial elements only. This section will address three key issues:

- Extend the methodology to capture FOC
- Establish the relation between FOP and the IGD key parameters
- Key IGD parameters computation for different FOC and FOP values.

The method to calculate the key IGD parameters for combinational elements, i.e., INV and NAND, with various FO values and transition time is discussed in the remainder of this section.

### A. Fan-out effect estimation methodology

Logical effort, a straightforward technique to estimate delay in CMOS circuits, links the FOC and propagation delay [11]. The normalized delay D of a logic gate can be expressed as the sum of two factors: the parasitic delay P and the stage effort F which depends on the gate load [11]:

$$D = NF + P, \tag{4}$$

where N is the path branching effort which indicates the fanout number, which can be represented as FOP in here. We compute the key final output IGD parameters by applying the logical effort methods to them by using the following equations:

$$\mu_{FOC} = FOC^*F_\mu + P_\mu \tag{5-1}$$

$$\lambda_{FOC} = FOC^*F_\lambda + P_\lambda \tag{5-2}$$

Table II : INV & NAND FOC KEY PARAMETERS VALUES

|      |                          | Cha | rging               |      | Discharging         |     |                     |     |
|------|--------------------------|-----|---------------------|------|---------------------|-----|---------------------|-----|
| GATE | GATE (e <sup>-11</sup> ) |     | (e <sup>-10</sup> ) |      | (e <sup>-11</sup> ) |     | (e <sup>-10</sup> ) |     |
|      | Рμ                       | Fμ  | Ρλ                  | Fλ   | Рµ                  | Fμ  | Ρλ                  | Fλ  |
| INV  | 3.8                      | 0.9 | 7.2                 | 1.26 | 4.6                 | 1.2 | 8.3                 | 0.4 |
| NAND | 5.0                      | 1.2 | 9.9                 | 0.6  | 6.0                 | 1.9 | 6.5                 | 0.8 |

To derive  $F_{\mu}$ ,  $P_{\mu}$  and  $F_{\lambda}$ ,  $P_{\lambda}$  for INV and NAND, two sets of data ( $\mu$  and  $\lambda$ ), i.e., FOC=1 and 2 for each gate, are collected and thereafter calculated by means of Eq. (5-1) and Eq. (5-2). Once all values ( $F_{\mu}$ ,  $P_{\mu}$ ,  $F_{\lambda}$  and  $P_{\lambda}$ ) are calculated,  $\mu$ and  $\lambda$  with various FOC values can be evaluated. Those key coefficients are summarized in Table II. It should be noted that the input transition time is 100ps.

#### B. Transition time effect estimation methodology

In the previous subsection, the FOC effect on the IGD model key parameters has been investigated and a methodology to calculate theses values has been introduced. Now, the FOP effect on propagation delay in the form of transition time degradation will be discussed. It is understandable that high FO values cause long output transition time there by increasing the propagation delays of the following gates. A look-up table is generated to capture the direct link between FOP and corresponding output transition time. INV gates with FOP=1, 2,  $\hat{3}$ , and  $\hat{4}$  are simulated with the same variation set-up utilized in Section III. The corresponding output transition time are listed in Table III, where it can be observed that with 100ps input transition time, the INV output transition time increases notably and the increment is quite steady following the FOP increase for both charging and discharging events. When it comes to FOP=4, the output rise and fall time (input for the following gates) exceed 200ps, which can greatly increase the propagation delay of the driven gates. Based on the Table III data, it is of interest to investigate the corresponding change of the  $\mu$  and  $\lambda$  values for different FOP when FOC=1 for both INV and NAND gate. The corresponding data are listed in Table IV, which provides the key parameter difference between two successive FOP values, namely  $T_u$  and  $T_{\lambda}$ .

Table III : FOP EFFECT ON OUTPUT TRANSITION TIME

| Input Transition<br>100ps | Tout <sub>rise</sub> [Increment (ps)] |    | Tout <sub>Fall</sub><br>(ps) | Increment (ps) |
|---------------------------|---------------------------------------|----|------------------------------|----------------|
| FOP=1                     | 85                                    | /  | 99                           | /              |
| FOP=2                     | 124                                   | 39 | 153                          | 54             |
| FOP=3                     | 165                                   | 41 | 204                          | 51             |
| FOP=4                     | 206                                   | 41 | 256                          | 52             |

Charging Discharging (e<sup>-10</sup>) INV (e<sup>-11</sup>)  $(e^{-11})$  $(e^{-10})$ Тμ λ Τλ Тμ λ Τλ μ μ FOP=1 7.2 3.8 4.6 8.3 FOP=2 5.5 4.6 0.8 8.1 0.9 0.9 12.5 4.2 FOP=3 5.4 0.8 9.2 1.1 7.2 0.9 16.9 4.4 FOP=4 6.2 1.0 8.0 0.8 10.2 0.8 21.24.3

Table IV : FOP EFFECT ON KEY PARAMETERS FOR INVERTER

Table V : FOP EFFECT ON KEY PARAMETERS FOR NAND

|       |                 | Cha                              | rging |                                | Discharging |                  |                     |     |  |
|-------|-----------------|----------------------------------|-------|--------------------------------|-------------|------------------|---------------------|-----|--|
| NAND  | (e <sup>-</sup> | <sup>11</sup> ) (e <sup>-1</sup> |       | <sup>0</sup> ) (e <sup>-</sup> |             | <sup>-11</sup> ) | (e <sup>-10</sup> ) |     |  |
|       | μ               | Τμ                               | λ     | Τλ                             | μ           | Τμ               | λ                   | Τλ  |  |
| FOP=1 | 5.0             | /                                | 9.9   | /                              | 6.0         | /                | 6.5                 | /   |  |
| FOP=2 | 5.8             | 0.8                              | 12.3  | 2.4                            | 6.9         | 0.9              | 7.9                 | 1.4 |  |
| FOP=3 | 6.6             | 0.8                              | 14.8  | 2.5                            | 7.8         | 0.9              | 9.3                 | 1.4 |  |
| FOP=4 | 7.4             | 0.8                              | 17.4  | 2.6                            | 8.6         | 0.8              | 10.5                | 1.2 |  |

According to Table IV and Table V, the increment in  $\mu$  and  $\lambda$  represented by constant  $T_{\mu}$  and  $T_{\lambda}$  is steady. Therefore, the FOP effect on our IGD model can be simply calculated as follows:

$$\mu_{FOP} = (FOP-1)T_{\mu} \tag{6-1}$$

$$\lambda_{FOP} = (FOP-1)T_{\lambda},\tag{6-2}$$

#### C. The FOC and FOP effects

After investigating FOC and FOP effects respectively on the propagation delay as well as the IGD model key parameters, the straightforward combination of these two parts is given in the following equations.

$$\mu_{LOGIC} = FOC^*F_\mu + P_\mu + (FOP-1)T_\mu \tag{7-1}$$

$$\lambda_{LOGIC} = FOC^*F_{\lambda} + P_{\lambda} + (FOP-1)T_{\lambda}$$
(7-2)



Fig. 13. A sample circuit with FOC=3 and FOP=2.

An INV based example where INV2 has FOC=3 and FOP=2 is illustrated in Fig. 13. The  $\mu$  and  $\lambda$  calculation for the INV2 charging is carried out based on the values in Table II and IV.

$$\mu_{INV2} = 3*0.9e^{-11} + 3.8 e^{-11} + (2-1)*0.8 e^{-11} = 7.3 e^{-11}$$
  
$$\lambda_{INV2} = 3*1.26 e^{-10} + 7.2 e^{-10} + (2-1)*0.9 e^{-10} = 11.88 e^{-10}$$

Note that in Section V, the  $\mu$  and  $\lambda$  calculation are carried out by utilizing Eq. (7-1) and (7-2).

#### V. MODEL VALIDATION FOR SYNCHRONOUS CIRCUITS

To prove that the proposed IGD model and the method to propagate the key parameters is valid and applicable to generic circuits, we compare the results obtained with our approach against SPICE simulation results for the following circuits:

- DFFs + 8-bit RCA
- DFFs + 8-bit DEMUX and MUX.

The use of our model on homogeneous circuits has already been presented and Full Adder (FA) circuit has been completely analyzed [9]. To complete a synchronous timing path, the previously presented data is being re-used along with the sub-threshold DFF values to complete the analysis. In this circuit, all involved gates have a fan-out of 1; thereby, the expansions of the IGD model explained in Section IV are not required. Regarding the second circuit, different fan-out values are in place thus the method discussed in Section IV is utilized. The Cumulative Distribution Functions (CDF), which is the integral of PDF is utilized to more clearly quantify the difference between the proposed model results and MC data. CDF is used instead of PDF as it provides the average probability of switching event occurrence. PDF captures the probability of the happening of switching activity at that instant. From the circuit point of view, we are interested if the switching event happens any time before the clock arrival. Consequently, the difference between the measured and the computed CDF is a better metric to evaluate the accuracy of our model.

### A. DFFs + 8-bit Ripple Carry Adder

Given the FA parameters derived in [9] and the DFF parameters presented in Section III, the corresponding  $\mu$  and  $\lambda$  for an 8-bit RCA with DFFs are computed. From static timing analysis perspective, the worst analysis of the longest propagation delay is of utmost importance. The critical path within the FA is the one from Carry-In to Sum since its delay is larger when compared to Carry-In to Carry-Out within a 8-

bit RCA, Hence, the longest delay occurs when the inputs A, B, Carry-In switch from all 0s to A=01111111, B=00000000, and Carry-In =1, resulting in an 10000000 output. In Fig. 14, the CDFs obtained by Monte Carlo simulation, the one obtained using the IGD model estimation, and GD based fitting are displayed for a delay range between Ons to 30ns. It is clear that the proposed IGD based delay prediction method closely approximate the MC simulation. An average mismatch of only 1.9% is recorded. On the other hand, the discrepancy between real simulation data and the GD fitting is obvious. Once more, the GD fitted CDF curve starts from a non-zero value with a 0.2 probability, which is clearly unrealistic. It is important to clarify that the IGD curve is obtained by plotting the IGD CDF function whose parameters have been computed using the proposed propagation methodology and starting from the single block key parameters (no MC simulation necessary). On the contrary the GD curve is an attempt to fit the MC data with a Gaussian curve. Even in this situation, where the GD fitting has the advantage of knowing the MC results, our method provides a much better approximation. CDF deviations between MC simulation and the IGD estimation for the 5ns to 30ns range with a 5ns step are summarized in Table VI, case in which the highest deviation is 4.5% at 5ns, which is at an early stage of propagation, while all the others are below 1%.



Fig. 14. DFFs + 8-bit RCA CDFs.

Table VI : DFFS + 8-BIT RCA CDF DEVIATIONS

| Deviation         | 5ns  | 10ns | 15ns | 20ns | 25ns | 30ns | Average<br>(0-30ns) |
|-------------------|------|------|------|------|------|------|---------------------|
| IGD<br>Estimation | 4.5% | 0.5% | 0.6% | 0.9% | 0.7% | 0.7% | 1.9%                |

## B. DFFs + 8-bit DEMUX and MUX

The corresponding key IGD parameters of the output of an 8bit DEMUX and MUX with DFFs can be evaluated by using the data and methodology presented in Section III and IV, i.e., fan-out number, and the entailing transition time. The schematic of the 8-bit DEMUX and MUX is displayed in Fig. 15 where only INVs and NANDs are being used. In Fig. 16, the CDFs obtained by means of Monte Carlo simulation, by IGD based estimation, and by GD fitting, are depicted for a delay range between 0ns to 6ns. The IGD estimation based curves closely follow the MC simulation with a slight deviation around 2ns and overall, it is better than the GD fitting. Table VII lists the CDF deviations from MC simulation and the IGD estimation as well as the GD fitting for a delay range from 1ns to 6ns. The highest mismatch is 3.4% at 2ns and the average overall error is 1.2% and 7.3% for the IGD estimation and the GD fitting, respectively. The fact that GD fitting performs better than IGD in the case of RCA circuit can be related to the smaller circuit size. It is worth mentioning that, due to the non-zero-crossing of the GD CDF, the GD deviation fitting will be too large if we choose timing range starting from 0ns.



Fig. 15. Schematic of 8-bit DEMUX and MUX.



Fig. 16. DFFs + 8-bit DEMUX and MUX CDFs

Table VII : DFFS + 8-BIT DEMUX AND MUX CDF DEVIATIONS

| Deviation         | 1ns    | 2ns   | 3ns  | 4ns  | 5ns  | 6ns   | Average<br>(1-6ns) |
|-------------------|--------|-------|------|------|------|-------|--------------------|
| IGD<br>Estimation | 2.6%   | 3.4%  | 2.4% | 0.6% | 0.1% | <0.1% | 1.2%               |
| GD Fitting        | 208.8% | 11.8% | 2.1% | 0.7% | 0.1% | <0.1% | 7.3%               |

## VI. CONCLUSIONS

In this paper, a comprehensive fan-out aware Inverse Gaussian Distribution (IGD) based delay model was introduced. The IGD model was verified for both combinational and sequential gates and fan-out effects were considered in two situations namely FOC and FOP. While both the GD and IGD models work well for normal voltage of operation, IGD is much better suited as compared to GD under sub threshold region of operation. The model is not only highly accurate (close match to SPICE Monte Carlo simulation results), but more importantly, it exhibits great flexibility against process and voltage supply variations. The key IGD model parameters evaluation is very straightforward, which is beneficial for the delay estimation of complex circuits. When compared to MC SPICE simulation data, obtained for the following circuits: (1) DFFs + 8-bit Ripple Carry Adder and (2) DFFs + 8-bit DEMUX and MUX, our methods provides average mismatches of 1.9% and 1.2% respectively, while requiring orders of magnitude less simulation time. Moreover, when compared against the GD fitting results, the IGD based estimation was more accurate in both cases. The average deviation for the IGD estimation was 85% smaller than the one corresponding to GD fitting, e.g., 1.2% versus 7.3% for the second circuit.

#### ACKNOWLEDGMENT

This work has been sponsored by the European Commission FP7 FET-Open iRISC (Innovative Reliable Chip Designs from Unreliable Components) project as well as the Science Foundation Ireland Project No 07/IN.1/I977.

#### REFERENCES

- Tang, X., De, V. K., & Meindl, J. D. (1997). Intrinsic MOSFET parameter fluctuations due to random dopant placement. *Very Large Scale Integration (VLSI) Systems, IEEE Transactions on*, 5(4), 369-376.
- [2] Gill, B. S., Papachristou, C., & Wolff, F. G. (2006, March). Soft delay error analysis in logic circuits. In *Design, Automation and Test in Europe, 2006. DATE'06. Proceedings* (Vol. 1, pp. 1-6). IEEE.
- [3] Soeleman, H., Roy, K., & Paul, B. C. (2001). Robust subthreshold logic for ultra-low power operation. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 9(1), 90-99.
- [4] Agarwal, A., Blaauw, D., Zolotov, V., Sundareswaran, S., Zhao, M., Gala, K., & Panda, R. (2002, June). Path-based statistical timing analysis considering inter-and intra-die correlations. In *Proc. TAU* (pp. 16-21).
- [5] Hwang, M. E. (2011). Supply-voltage scaling close to the fundamental limit under process variations in nanometer technologies. *Electron Devices, IEEE Transactions on*, 58(8), 2808-2813.
- [6] Merrett, M., Asenov, P., Wang, Y., Zwolinski, M., Reid, D., Millar, C. & Asenov, A. (2011, March). Modelling circuit performance variations due to statistical variability: Monte Carlo static timing analysis. In *Design, Automation & Test in Europe Conference & Exhibition* (DATE), 2011 (pp. 1-4). IEEE.
- [7] Wan, L., & Chen, D. (2010, November). Analysis of circuit dynamic behavior with timed ternary decision diagram. In *Proceedings of the International Conference on Computer-Aided Design* (pp. 516-523). IEEE Press.
- [8] Zaynoun, S., Khairy, M. S., Eltawil, A. M., Kurdahi, F. J., & Khajeh, A. (2012, September). Fast error aware model for arithmetic and logic circuits. In*Computer Design (ICCD), 2012 IEEE 30th International Conference on* (pp. 322-328). IEEE.
- [9] Chen, J., Spagnol, C., Grandhi, S., Popovici, E., Amaricai, A. & Cotofana, S. (2014, July). Linear Compositional Delay Model for the

Timing Analysis of Sub-Powered Combinational Circuits. In VLSI (ISVLSI), 2014 IEEE Computer Society Annual Symposium on (pp. 380-385). IEEE.

- [10] Mörters, P., & Peres, Y. (2010). Brownian motion (Vol. 30). Cambridge University Press.
- [11] Sutherland, I. E., Sproull, R. F., & Harris, D. F. (1999). Logical effort: designing fast CMOS circuits. Morgan Kaufmann.
- [12] Gerosa, Gian, Sonya Gary, Carl Dietz, Dac Pham, Kathy Hoover, Jose Alvarez, Hector Sanchez et al. "A 2.2 w, 80 mhz superscalar risc microprocessor." *Solid-State Circuits, IEEE Journal of* 29, no. 12 (1994): 1440-1454.