# Timing Error Analysis of Flooded LDPC Decoders

Alexandru Amaricai<sup>1</sup> Valentin Savin<sup>3</sup> Oana Boncalo<sup>1</sup> Nicoleta Cucu-Laurenciu<sup>2</sup> Joyan Chen<sup>2</sup> Sorin Cotofana<sup>2</sup>

<sup>1</sup> University Politehnica Timisoara, Romania, Email: {alexandru.amaricai, oana.boncalo}@cs.upt.ro

<sup>2</sup>Computer Engineering Department, TU Delft, the Netherlands, Email: {N.CucuLaurenciu, S.D.Cotofana}@tudelft.nl

<sup>3</sup>MINATEC Campus, CEA-LETI, Grenoble, France, Email : valentin.savin@cea.fr

Abstract-In this paper we perform a fault tolerance assessment of flooded LDPC decoders affected by probabilistic timing errors, characteristic to sub-powered CMOS circuits. We investigate the error correction capability - in terms of Frame Error Rate (FER) - of faulty flooded Min-Sum (MS) and Self-Corrected Min-Sum (SCMS) LDPC architectures for both Binary Input Additive White Gaussian Noise (BIAWGN) and Binary Symmetric Channel (BSC) channel models and flooded FAID architectures for BSC channel model. The analysis is performed using a multi-level fault injection methodology, which accurately captures the probabilistic error profile of each combinational and memory blocks' output, according to the different clock constraints. The analysis indicates that the LDPC decoders are capable to correct errors affecting the internal datapath, not only those which appear in the transmission channel. Furthermore, the decoder potential to increase throughput by means of overclocking has been estimated to be between 77% and 150%, while preserving the nominal error correction performance.

Keywords—LDPC Decoder, Simulated Fault Injection, Timing Analysis, Flooded Architectures

#### I. INTRODUCTION

In the traditional models for data communication, it is assumed that error correction coding and decoding are reliable, while errors appear only in the transmission channel. However, in the deep nanometer CMOS technologies, the reliability of the forward error correction (FEC) coders and decoders may be affected by process, voltage and temperature (PVT) variations. Furthermore, the fault tolerance problems of nano logic devices are further augmented by the usage very low supply voltages, used in order to tackle the increased power consumption of modern integrated circuits. Therefore, it becomes critical to evaluate the error correction capability of FEC decoders which are built using unreliable logic components.

Theoretical analysis of FEC decoders, mostly of LDPC decoders running on faulty hardware has been performed in works of [1-9]. In [1] decoding performance evaluation of one step majority LDPC decoders has been performed. Density evolution equations have been derived for noisy Gallagher-B and Belief-Propagation decoders [3][4]. Decoding performance has been analyzed for FAID LDPC decoders with processing errors [5]. MS LDPC decoder analysis for unreliable message storage has been performed in [6]. The error correction performance of MS based decoders under probabilistic faults has been performed in [7][8]. Fault tolerant decoding algorithm based on probabilistic gradient bit flipping decoder has been proposed in [9]. All these approaches performed theoretical

analysis on LDPC C++/Matlab models. The major drawbacks of this works are: (i) few architectural and implementation details, such as memory organization, pipelining, processing units implementation, routing network implementation, have been considered (ii) simple probabilistic error models have been considered; the fault maps do not take into account the architectural and implementation characteristics of the LDPC decoders.

In this paper, we present a fault injection based reliability analysis of flooded LDPC decoder architecture. The LDPC decoders are described in Verilog HDL at RTL level. Regarding the fault injection, we have used a 3-level reliability assessment methodology. It is based on statistical static timing analysis (SSTA) performed in SPICE [10], probability density function (PDF) propagation at gate level [10, and saboteur based SFI at RTL [11]. This way, we have modeled in an accurate way probabilistic timing errors characteristic to subpowered CMOS devices. The performed analysis represents a step forward with respect to the theoretical analysis in, as it is performed on synthesizable RTL descriptions of LDPC decoders and assumes more realistic timing fault models. We have used the same architecture for three types of decoding algorithms - MS, SCMS and FAID - and we have analyzed their error correction capability for both BIAWGN and BSC.

The paper is organized as follows: Section II presents the flooded LDPC decoder architectures; the evaluation methodology and the fault map generation is described in Section III; decoding performance results for MS and SCMS decoders under BIAWGN are presented in Section IV; the results for MS, SCMS and FAID decoders under BSC are given in Section V.

# II. FLOODED LDPC DECODER ARCHITECTURE

We have implemented flooded MS, SCMS and FAID decoders for a (3,6) regular quasi-cyclic LDPC code, with codeword length of 1296 bits and a circulant size of 54. The architecture for the decoders is depicted in Fig.1. The number of both check node units and variable node units is equal to the circulant size (54). The message processing is performed as follows: (i) serial processing of messages at both CNU and VNU level (ii) the columns/rows in the parity check matrix corresponding to a column/row in the base matrix are processed in parallel (iii) the processing of the columns/rows within the base matrix is performed in serial.



Figure 1 – The flooded LDPC decoder architecture (erasure and  $\alpha$  sign memories are used only in SCMS)

We have considered quantisations of 6 bits for AP-LLR and 4 bits for input LLR and check node message for the MS and SCMS, and 3 bits for check and variable node messages for the FAID decoder. The architecture consists of: Input Log Likelihood Ratios (LLR) memory, Variable Node Unit (VNU) block (54 VNUs which compute the corresponding variableto-check messages ( $\alpha$ ) for a column in the B matrix), Variable-to-check message memory, Variable-to-check message Barrel Shifter (BS), Check Node Unit (CNU) block (54 which units compute the corresponding check-to-variable messages ( $\beta$ ) for a row in the *B* matrix), *Check-to-variable* message memory (stores the check-to-variable node messages  $(\beta)$  in a compressed form), *Check-to-variable message BS*, Hard-decision buffer and early-termination circuit, and th Control unit (which provide the appropriate sequence of operations during the decoding process)

SCMS decoding is based on the variable node message erasure, which is performed when the newly computed one has a different sign with respect to the previous one [13]. With respect to MS decoder, SCMS requires a different VNU, as well as two additional memories: the previous variable node message sign memory and the erasure bit memory. FAID decoding is based on the computation of new variable node messages from the input LLR and check-node messages using dedicated look-up tables [14]. The FAID decoder has the same architecture as the MS, with different VNU – specific to the FAID variable node computation -, and different sizes for memories and BS, due to different quantization.

The CNU is common to all the decoders and has 3 combinational stages. VNU has 4 combinational stages for MS, 5 for SCMS and 6 for FAID decoder. The BS are implemented using a single pipeline stage.

## **III. FAULT INJECTION ANALYSIS**

#### A. Methodology

We have used a 3-level reliability evaluation methodology, which consists of:

| TABLE I – MINIMUM AND MAXIMUM PROBABILITIES IN $MS$ , $SCMS$ and |
|------------------------------------------------------------------|
| FAID DECODERS (NON-ZERO MINIMUMS)                                |

| Clock          | MS       |          | SC       | MS       | FAID     |          |
|----------------|----------|----------|----------|----------|----------|----------|
| Period<br>(ns) | Min      | Max      | Min      | Max      | Min      | Max      |
| 5.5            | 1.06E-09 | 1.06E-09 | 1.07E-09 | 1.07E-09 | 1.06E-09 | 1.06E-09 |
| 4              | 3.46E-09 | 2.92E-06 | 2.92E-06 | 5.78E-09 | 3.46E-09 | 2.92E-06 |
| 2.5            | 2.53E-09 | 5.04E-03 | 5.04E-03 | 2.53E-09 | 2.53E-09 | 5.04E-03 |
| 2.2            | 2.72E-09 | 1.93E-02 | 1.93E-02 | 2.72E-09 | 2.72E-09 | 1.93E-02 |
| 1.9            | 1.43E-08 | 6.73E-02 | 6.73E-02 | 1.43E-08 | 1.43E-08 | 6.73E-02 |
| 1.7            | 2.40E-07 | 1.43E-01 | 1.43E-01 | 4.02E-09 | 2.40E-07 | 1.43E-01 |

- 1. *Standard cell characterization* SSTA based on Monte-Carlo SPICE simulation is used for determining the the propagation delay distribution for PVT variations of each standard cell component; the output of this step represented by each standard cell component's PDF; the simulations have been performed for 45 nm PTM transistor model [12] and a supply voltage of 0.3V.
- Gate level PDF propagation this step is performed in order to determine the failure probability of each primary output (PO) of each combinational block for a given delay constraint; for each PO, the worst propagation path is determined; the delay distribution is derived using a linear composition of PDFs [10]; the cumulative distributed function (CDF) of each PO is derived, which the failure probability of each PO dependent on a delay constraint;
- Saboteur based RTL SFI probabilistic saboteurs are inserted in the RTL description on each PO of the combinational blocks as well as the memory components;

By applying this methodology, we model in an accurate way the effects of probabilistic timing errors across the entire circuit. Thus, combinational outputs with lower latency have lower failure probability, as they are more likely to meet the timing constraints.

Regarding the results processing, we have performed cross simulation between a transmission chain C++ model and the RTL Verilog description. The cross simulation is performed using a System Verilog framework, which makes use of the of the DPI-C interface. The transmission chain C++ contains the random frame generator, LDPC encoder, channel noise model, and the result comparison. We are performing comparison between the output of the fault injected RTL LDPC decoder and the output of the LDPC encoder. This way, we are able to compute FER and BER of the injected RTL decoder.



Figure 2 - Average Number of Expected Faults per Iteration

| Clock<br>Period<br>(ns) | Mem  | BS   | MS/<br>SCMS<br>CNU | MS<br>VNU | SCMS<br>VNU | FAID<br>CNU | FAID<br>VNU |
|-------------------------|------|------|--------------------|-----------|-------------|-------------|-------------|
| 5.50                    | 0.00 | 0.00 | 1.28               | 0.00      | 0.00        | 9.70        | 0.00        |
|                         | E+00 | E+00 | E-10               | E+00      | E+00        | E-11        | E+00        |
| 4.00                    | 0.00 | 0.00 | 3.51               | 3.14      | 2.47        | 2.66        | 2.26        |
|                         | E+00 | E+00 | E-07               | E-08      | E-08        | E-07        | E-09        |
| 2.50                    | 9.64 | 2.79 | 6.05               | 1.92      | 1.51        | 4.58        | 4.61        |
|                         | E-07 | E-06 | E-04               | E-04      | E-04        | E-04        | E-05        |
| 2.20                    | 1.28 | 2.71 | 2.33               | 9.94      | 7.81        | 1.76        | 3.12        |
|                         | E-05 | E-05 | E-03               | E-04      | E-04        | E-03        | E-04        |
| 1.90                    | 1.60 | 2.62 | 8.13               | 4.80      | 3.77        | 6.18        | 1.98        |
|                         | E-04 | E-04 | E-03               | E-03      | E-03        | E-03        | E-03        |
| 1.70                    | 8.28 | 1.17 | 1.75               | 1.30      | 1.02        | 1.33        | 6.48        |
|                         | E-04 | E-03 | E-02               | E-02      | E-02        | E-02        | E-03        |

TABLE II – AVERAGE FAILURE PROBABILITIES FOR LDPC DECODER COMPONENTS

#### B. Fault Map of LDPC Decoders

We have injected errors only in the data-path computation. A fault free control has been considered. Injecting faults into the control unit would create severe disruptions in the LDPC decoder's data flow, including reading/writing messages from/to incorrect memory addresses or routing messages to the inappropriate processing units, which might make the decoder unable to perform the LDPC decoding algorithms. This represents a realistic assumption as in real-life designs it is common make controllers more robust that data-paths.

Table I presents the minimum non-zero and maximum failure probabilities in the three LDPC decoders. Table II depicts the average failure probabilities in the LDPC decoders' components, i.e the memory blocks, the BS, the VNUs and the CNUs. For the considered technology and supply voltage, we have performed simulations for clock periods between 5.5 ns and 1.7 ns, which correspond to error probabilities higher than  $10^{-9}$ . Fig. 2 depicts the number of estimated number of activate faults in one iteration in the entire LDPC decoder, for a clock period between 2.5 ns and 1.7 ns. It can be noted that the FAID decoder has significantly smaller number of expected faults with respect to the MS based decoders.

# C. Simulation setup

We have performed simulations on 6 desktop computers. The RTL SFI has been performed using Modelsim 10.2 simulator. For BIAWGN, we have simulated for SNR values between 1 dB and 2.75 dB, while for BSC we simulated for a crossover probability between 0.02 and 0.1. Regarding the number of simulated frames, the simulation for a specific SNR or crossover probability value would have stop either when 100 erroneous frames are obtained, either when a maximum number of 200.000 frames have been simulated. A maximum number of 30 iterations are performed for the LDPC decoder. Simulating failure probabilities of  $10^{-9}$  are statistically relevant due to the high number of simulations, the high number of signals which are fault injected and due the high number of clock cycles required by the decoding process.

# IV. ANALYSIS OF LDPC DECODERS UNDER BIAWGN

We have analyzed of faulty MS and SCMS flooded LDPC decoder under BIAWGN. Regarding the inputs for MS and SCMS decoders are 4-bit signed integers, representing quantized LLR values. We have used a gain factor applied to the channel output of 3.5, which has been proved to yield good performance in both waterfall and error floor region.



Figure 4 – FER for faulty SCMS under BIAWGN with errors injected in the two memories



Figure 5 – FER for faulty SCMS under BIAWGN with no errors injected in the two memories



Figure 6 - FER for Faulty MS under BSC with gain factor of 3

Fig. 3 depicts the FER for the faulty MS decoder. The figures indicate that the MS decoder has the same decoding performance for a clock period of 3.1ns as a fault free decoder. The average error rate for a decoder with a clock period of 3.1ns is of order  $10^{-5}$ , with a maximum of order  $10^{-4}$ . A slight decoding performance degradation (of less than 0.1dB for a  $10^{-5}$  BER) is observed when clock frequency is increased to 400MHz (clock period of 2.5ns). The average error rate in this case is of order  $10^{-4}$ , while the maximum error rate is of order  $10^{-3}$ . Therefore, average error rates of up to  $10^{-4}$  in the decoder do not or only slightly affect the error correction capability. Significant decoding performance degradation is observed for a clock period of 2.2ns, which corresponds to an average error rate of  $10^{-3}$ . For a clock period of 1.9ns, the MS decoder cannot decode.

Fig. 4 depicts the FER for the faulty SCMS decoder, when the two additional memories are injected with errors. The figures indicate that the SCMS decoder has the same decoding performance for a clock period of 2.5ns as a fault free decoder. For a clock period of 2.2ns, the SCMS decoder exhibits an error floor starting at SNR  $\approx$  2.5dB. For a clock period of 1.9ns, the circuit has no error correction capability. Fig. 5 depicts the FER of the faulty SCMS decoder, when the two additional memories are fault free. A small decoding performance degradation (less than 0.1 dB) is observed for a clock period of 2.2ns. The error floor phenomena at SNR 2.5 dB is not present when the two additional memories are error free.

# V. ANALYSIS OF LDPC DECODERS UNDER BSC

We have analyzed MS, SCMS and FAID decoders under BSC. Regarding the gain factor, we have considered values of 3 and 4 for the MS decoder. SCMS decoder is insensitive to the gain factor; thus, it is enough to simulate the SCMS decoder for a value of 4. The gain factor values do not apply to the FAID decoder, which operate on binary input data.

Fig. 6 and 7 depict the FER for the MS decoder with gain factor of 3 and 4. The results indicate a strong influence of the channel value for the MS decoder. On one hand, a channel value of 4 will lead to a better error correction capability of

the decoder with respect to a channel value of 3. On the other hand, for a clock period of 2.2ns, the decoding performance when applying a channel value of 4 is almost the same with the one of an error-free decoder. Applying a channel value of 3, decoding performance degradation can be observed for a clock period of 2.2ns with respect to the error-free decoder. For both values of the channel value, for a clock period of 2.5ns or higher there is no performance loss in decoding, while for a clock period of 1.9ns, the decoders do not decode.



Figure 7 - FER for Faulty MS under BSC with gain factor of 4







Figure 9 - FER for Faulty SCMS with no errors injected in the two memories



Fig. 8 depicts the FER for SCMS, when the two additional memories are injected with faults, while Fig. 9 depicts the FER for SCMS decoders with error free additional memories. The obtained results are similar to the ones obtained for BI-AWGN. For the SCMS decoder with faulty previous  $\alpha$  sign memory and faulty erasure memory, an errorfloor type of behavior can be observed for a clock period of 2.2ns. For the SCMS decoder with error free additional memories, the decoding performance for the clock period of 2.2ns is similar to an error free decoder.

Fig. 10 depicts the simulation results for the FAID decoders under BSC channels. The FAID decoder presents no decoding performance degradation for clock periods of 2.2ns or higher with respect to the error free decoder. Slight performance degradation is observed for a clock period of 1.9ns. The FAID decoder cannot decode for clock periods of 1.7ns or lower.

# VI. CONCLUSIONS

We have performed a SFI based analysis of flooded LDPC decoder architecture. The goal has been to assess the reliability of the LDPC decoders under probabilistic timing errors, characteristic to the sub-powered CMOS devices. We have accurately modeled the timing faults affecting the combinational and memory outputs of the LDPC decoder using a multi-level SPICE-analytical-SFI approach. The following conclusions can be drawn from the performed analysis:

- The LDPC decoders are capable of correcting errors which also appear in their internal data path, not only errors in the transmission channel
- 2. For BSC, the gain factor has a strong influence on the MS decoder performance: a gain factor of 4 leads to better better fault tolerance with respect to gain factor of 3.
- 3. For both BSC and BIAWGN channels, the errors in the two SCMS-specific memories have a strong influence on the SCMS performance. For a clock period of 2.2ns, the SCMS decoder presents a high error floor when the two memories are affected by faults. When no errors are injected in the two additional memories, no decoding performance degradation occurs for SCMS for that value.

4. FAID decoder can support lower clock frequency with respect to MS based decoders; however, the FAID decoder has a smaller number of expected errors for the same clock period with respect to the MS and SCMS decoder.

The simulations indicate that the LDPC decoders have the potential to increase throughput by means of overclocking between 77% and 150%, while preserving the nominal error correction performance, for the considered technology and supply voltage value. The obtained results suggest that energy improvements for LDPC decoders can be obtained by applying voltage scaling without the corresponding clock frequency reduction.

#### ACKNOWLEDGMENT

This work has been supported by the European Commision Framework Program 7, project "i-Risc: *Innovative Reliable Chip Design from Low Power Unreliable Components*", under Grant Agreement 309129.

#### REFERENCES

- S. K. Chilappagari, M. Ivkovic, and B. Vasic, "Analysis of one step majority logic decoders constructed from faulty gates," *Proc. of IEEE Int. Symp. on Inf. Theory (ISIT)*, 2006, pp. 469–473
- [2] L. R. Varshney, "Performance of LDPC codes under faulty iterative decoding," IEEE Trans. Inf. Theory, vol. 57, no. 7, pp. 4427–4444, 2011
- [3] S. Yazdi, H. Cho, Y. Sun, S. Mitra, and L. Dolecek, "Probabilistic analysis of Gallager B faulty decoder," *Proc. IEEE Int. Conf. on Comm.* (*ICC*), 2012, pp. 7019–7023
- [4] S. Yazdi, H. Cho, and L. Dolecek, "Gallager B decoder on noisy hardware," IEEE Trans. on Comm., vol. 66, no. 5, pp. 1660–1673, 2013
- [5] E. Dupraz, D. Declercq, V. Vasic, and V. Savin, "Finite alphabet iterative decoders robust to faulty hardware: Analysisand selection," in *Proc. IEEE Int. Symp. on Turbo Codes and Iterative Inf. Processing* (ISTC), August 2014
- [6] A. Balatsoukas-Stimming, A. Burg, "Density evolution for min-sum decoding of LDPC codes under unreliable message storage," IEEE Comm. Letters, no. 99, pp. 1–4, 2014
- [7] C. L. Kameni Ngassa, V. Savin, and D. Declercq, "Min-sum-based decoders running on noisy hardware," Proc. of IEEE Global Comm. Conf. (GLOBECOM), 2013
- [8] C. L. Kameni Ngassa, V. Savin, E. Dupraz, and D. Declercq "Density Evolution and Functional Threshold for the Noisy Min-Sum Decoder"," IEEE Trans. on Comm, Vol.63, Issue 5, 2015
- [9] P. Ivanis, O. Ras, B. Vasić, "MUDRI: A fault-tolerant decoding algorithm" Proc. IEEE Int. Comm. Conf. (ICC 2015), 2015
- [10] J Chen, C Spagnol, S Grandhi, E Popovici, S Cotofana, A Amaricai "Linear Compositional Delay Model for the Timing Analysis of Sub-Powered Combinational Circuits" *Proc. 2014 IEEE Annual Symp. On VLSI (ISVLSI)*, 2014
- [11] E. Jenn, J. Arlat, M. Rimén, J. Ohlsson, J. Karlsson "Fault injection into VHDL models: the MEFISTO tool" Proc. Fault Tolerant Computing Systems (FTCS), 1994
- [12] W. Zhao, Y. Cao, "New generation of Predictive Technology Model for sub-45nm early design exploration," IEEE Transactions on Electron Devices, vol. 53, no. 11, pp. 2816-2823, November 2006
- [13] V. Savin "Self-corrected Min-Sum decoding of LDPC codes" Proc. Int. Symp. On Inf. Theory (ISIT), 2007
- [14] S. K. Planjery, D. Declercq, L. Danjean, B. Vasić, "Finite Alphabet Iterative Decoders, Part I: Decoding Beyond Belief Propagation on the Binary Symmetric Channel," IEEE Trans. Commun., vol. 61, no. 10, pp. 4033 - 4045, Nov. 2013.