# Error Correction Schemes with Erasure Information for Fast Memories

Samuel Evain<sup>1</sup>, Valentin Savin<sup>2</sup>, Valentin Gherman<sup>1</sup>

<sup>1</sup>CEA, LIST, Saclay Nano-INNOV PC 172, 91191 Gif sur Yvette CEDEX, France firstname.lastname@cea.fr  <sup>2</sup> CEA, LETI, MINATEC Campus
17 rue des Martyrs, 38054 Grenoble Cedex 9, France firstname.lastname@cea.fr

Abstract-Two error correction schemes are proposed for wordoriented binary memories that can be affected by erasures, i.e. errors with known location but unknown value. The erasures considered here are due to the drifting of the electrical parameter used to encode information outside the normal ranges associated to a logic 0 or a logic 1 value. For example, a dielectric breakdown in a magnetic memory cell may reduce its electrical resistance sensibly below the levels which correspond to logic 0 and logic 1 values stored in healthy memory cells. Such deviations can be sensed during memory read operations and the acquired information can be used to boost the correction capability of an error-correcting code (ECC). The proposed schemes enable the correction of double-bit errors based on the combination of erasure information with single-bit error correction and doublebit error detection (SEC-DED) codes or shortened (SEC) codes. The correction of single-bit errors is always guaranteed. Ways to increase the number of double-bit and triple-bit errors that can be detected by shortened SEC and SEC-DED codes are considered in order to augment the error correction capability of the proposed solutions.

Keywords-resistance-switching memory; word-oriented binary memory; MRAM; MTJ; dielectric breakdown; ECC; shortened ECC; SEC; SEC-DED; DEC; TEC; erasure; soft information

## I. INTRODUCTION

The robustness of memory subsystems can be effectively increased with the help of ECCs [3][4][11][19]. Unfortunately, the ECCs are expensive in terms of latency and storage overhead and this cost may rise quickly with the number of correctable errors. For example, a conventional single-bit error correction (SEC) code requires m+1 check-bits for  $2^m$  data-bits, while a conventional double-bit error correction (DEC) code needs 2(m+1) check-bits for the same number of data-bits [2].

Here, we consider word-oriented binary memories which enable the identification of bits with a reduced level of confidence, called erasures. Such bits are not necessarily erroneous, but they have a higher probability to be affected by errors. Erroneous bits which involve erasures are easier to correct since their positions are known [2]. We assume that the identification of erasures does not require supplementary memory operations. For example, this may be the case of magnetic memory (MRAM) cells where the information is encoded by the electrical resistance of a magnetic tunnel junction (MTJ). An MTJ may be affected by soft or hard dielectric breakdown which can cause the drifting of its electrical resistance outside the characteristic range of healthy MTJs [5][14][16].

Erasures may also occur in aggressively scaled memories if distinct logic values are encoded by overlapping distributions of some relevant electrical parameter. For example, the electrical charge stored on the floating gates of  $E^2$ PROM/flash memory cells may be distributed over slightly overlapping ranges [21]. Any read-out value that falls into the overlapping region may be considered as an erasure.

Different approaches to extract and use erasure information are available. When the memory latency and endurance are not an issue, the *double-complement algorithm*, can be used to detect erasures induced by permanent faults [2]. Soft-decision decoding, usually applied to LDPC codes, offers a natural support to exploit erasure information [6][18][21]. An algebraic decoding method able to handle erasures has also been proposed for BCH codes [7]. Unfortunately, such complex decoding methods are more suitable for block-oriented low-latency storage devices and rather inappropriate for wordoriented fast memories.

Here, we propose two error correction schemes which rely on erasure information in order to boost the number of errors that can be corrected with SEC-DED codes. Erasure information is only exploited in the presence of detectable multi-bit errors so that single-bit errors are always corrected. Double-bit errors become correctable in the presence of at least one true positive, i.e. an erroneous bit indicated as erasure. Successful decoding in the presence of false positives, i.e. correct bits indicated as erasures, requires that both erroneous bits are identified as erasures. It is shown that virtually all double-bit errors become correctable if the probability that erroneous bits are indicated as erasures is larger than 90%.

These schemes have also been analyzed in combination with shortened SEC codes. Here, an ECC is called shortened if the number of data-bits per code word is below the maximal limit allowed by the available check-bit number. The use of shortened ECCs is imposed by the use of memory word sizes equal to a power of 2. In such cases, important fractions of double-bit and triple-bit errors can be detected with shortened SEC and SEC-DED codes, respectively.

Submission based on an ETS2013 paper and solicited by ETS.

First and last authors acknowledge partial support from the EMYR (Enhancement of MRAM Memory Yield and Reliability) project funded by the French National Research Agency.

Second author acknowledges funding from the European Union's FP7 Programme, under Grant Agreement number 309129 (i-RISC project).

In order to increase the error correction capability of the proposed schemes, ways to construct shortened SEC codes with a high number of detectable double-bit errors and shortened SEC-DED codes with improved triple-bit error detection (TED) are presented as well.

Two sources of memory erasures are discussed in Section II. Section III presents two error correction schemes that are based on SEC-DED codes and can take advantage of erasure information in order to enable DEC capability. The error correction capability of these schemes is evaluated in Section IV. Methods to increase the number of double-bit and triple-bit errors detected with shortened SEC and SEC-DED codes are presented Section V and Section VI, respectively. Such codes can be used in conjunction with the error correction schemes presented in Section III in order to mask all detectable multi-bit errors which involve erasures. Quantitative evaluations of the impact on storage, latency and logic are reported in Section VIII.

#### II. SOURCES OF ERASURES IN MEMORIES

In certain resistance-switching memories, erasures can be induced by dielectric breakdown. For example, in MRAM cells, the information is encoded by the electrical resistance of an MTJ which consists of two ferromagnetic layers separated by a thin insulating layer [20]. The insulating layer can be subject to soft or hard dielectric breakdown which represents a primary reliability concern in aggressively scaled magnetic memories [5][14][16]. The electrical resistance of an MTJ with hard dielectric breakdown is sensibly lower than the electrical resistance of healthy MTJs storing either a logic 1 or a logic 0 [5][16]. This may also be the case of MTJs that have experienced at least two soft dielectric breakdowns [14].

In an MRAM, the dispersion of the MTJ electrical resistance can be approximated by a Gaussian distribution [20] which, in the presence of dielectric breakdown, can be sketched as in Figure 1. MTJs with antiparallel magnetization (AM) of the ferromagnetic layers have the largest electrical resistance while those with parallel magnetization (PM) have a lower electrical resistance [20]. These two magnetization states are used to encode binary values. MTJs with dielectric breakdown have the lowest electrical resistance.

As illustrated in Figure 1, such erasures can be detected if the values measured during a memory read operation are compared against one additional reference value. As long as the electrical resistances of storage cells with dielectric break down fall into a range which is disjoint from the ranges corresponding to AM and PM states, an erasure has equal probabilities to hide a logic 0 or a logic 1

In binary memories that are aggressively scaled to the detriment of storage cell reliability, the electrical parameter used to encode information may be distributed as illustrated in Figure 2 [21]. For example, such an electrical parameter can be the electrical resistance of MRAM cells [20] or the electrical charge stored on the floating gate of  $E^2PROM/flash$  memory cells [21]. As shown in Figure 2, erasures can be detected if the values measured during a memory read operation are compared against two reference values instead of one reference value. Enriched erasure information can be obtained if the measured values are compared against three reference values as indicated in Figure 3. The reference value in the middle allows making a first guess of the stored logic value. The other two reference values define a range which corresponds to logic values that are measured with a low level of trust. For this shape of the probability density function, the probability that a bit with a low level of trust has indeed a wrong value is less than 50%.

Erasures can be detected with limited impact on memory latency if the electrical signals sensed during a memory read operation are compared concurrently against the reference values illustrated in Figure 1 to Figure 3. A possible approach is to derive these values from a unique reference via mirror and division transistors with appropriately selected gate width ratios [20].



Fig. 1. Distribution of the MTJ electrical resistance, expressed in an arbitary unit [a.u.] in MRAMs with erasures.



Fig. 2. Distribution of the information encoding parameter, expressed in an arbitary unit [a.u.] in agressively scaled memories [21].



Fig. 3. Extraction of enriched erasure information with the help of three reference values in agressively scaled memories [21].

# III. DEC SCHEMES BASED ON SEC-DED CODES AND ERASURE INFORMATION

Two error correction schemes are proposed in which the erasure information and SEC-DED codes are used to enable DEC capability. Besides the stored code word, both schemes receive an additional word where each bit indicates if the bit with the same index in the code word has been identified as an erasure during a memory read operation.

A first solution that uses two *conventional* decoders is presented in Figure 4. During a memory read operation, the first decoder receives bit values which are read-out from the storage cells using conventional reference values such as, for example, the reference value used to discriminate between the AM states and the PM states in Figure 1 or Figure 3. The second decoder gets a modified word in which the bits indicated as erasures are flipped.

The first decoder can be a conventional SEC-DED decoder. The second decoder can be a conventional SEC decoder which receives the same number of bits except for one check-bit as compared to the SEC-DED decoder. In the case of a linear block SEC-DED code, any check-bit can be neglected and, assuming that each parity-check matrix line is used for the calculation of a syndrome bit, the SEC parity-check matrix can be derived from the SEC-DED parity-check matrix if:

- the SEC-DED parity-check matrix is transformed via linear combinations among its lines such that the new parity-check matrix defines the same SEC-DED code and contains only one line which involves the check-bit to be neglected i.e. only one syndrome bit depends now on this check-bit,
- the parity-check matrix line which involves the checkbit to be neglected is eliminated i.e. the only syndrome bit which depended on this check-bit is not considered by the SEC decoder,
- the resulting SEC parity-check matrix is transformed via linear combinations among its lines such that the new parity-check matrix defines the same SEC code and reaches a minimal density i.e. a minimal number of logic 1 values.

The first step above can be avoided if the selected check-bit is taken into account by only one SEC-DED parity-check matrix line. In this case, the third step above is not necessary if the density of the SEC-DED parity-check matrix is minimal. A second criterion is to neglect the check-bit which will cause the elimination of the matrix line with the largest density.

In Figure 4, a multiplexer is used to select among the corrected data words delivered by the two conventional decoders. The output of the second decoder is selected only if the first decoder encounters an uncorrectable error. If the probability that erroneous bits are true positives is high, the second decoder receives a fully or partially corrected code word such that it will deliver an error-free data word with a high probability.

Any triple-bit error that generates a syndrome different from the syndrome of any single-bit error can be detected by the first decoder and can also be corrected by the second decoder if at least two of the affected bits are indicated as erasures. Since the minimum Hamming distance of a SEC-DED code is equal to 4, all triple-bit errors could be detected and corrected in the presence of an advantageous mix of true and false positives [2][7]. This feature is not considered here as it could prevent the correction of single-bit errors. For example, this could happen in the presence of three false positives that indicate a triple-bit error which generates the same syndrome as the actual single-bit error which is not indicated as erasure.

Figure 5 illustrates an error correction scheme that uses two conventional SEC-DED decoders in order to signal uncorrectable errors at system level. Such a symmetrical structure in which both decoders are identical and only the inputs are different is similar to the solution proposed in [23] which is applied to extended Golay codes.

An *unconventional* error correction scheme that uses a SEC-DED code and erasure information to enable DEC capability is sketched in Figure 6. A syndrome different from the all-zero syndrome and not reserved for single-bit error correction is used in a special module to select among the double-bit errors which could generate it. A double-bit error is selected only if at least one of the bits affected by this error is indicated as erasure. Another special module is used to count the total number of erasures. If this number is larger than 1, a double-bit error is selected only if both bits affected by this error are indicated as erasures. Syndrome generation and single-bit error correction are performed in a conventional way.



Fig. 4. DEC scheme based on a SEC-DED code, erasure information and conventional SEC and SEC-DED decoders.



Fig. 5. DEC scheme with uncorrectable error indication based on a SEC-DED code, erasure information and conventional SEC-DED decoders.



Fig. 6. DEC scheme based on a SEC-DED code, erasure information and unconventional error-correcting logic.

In Figure 6, an error vector is provided by the error selection modules in order to indicate the bit positions affected by the selected single-bit or double-bit error. By construction, at most one error selection module can deliver a non-zero error vector at a time.

The schemes proposed here do not rely on special distances between the code words which were introduced to take into account the erasure information i.e. the analog weight [1] or the generalized distance [8]. For example, in the presence of a false and a true positive, the schemes proposed here cannot correct double-bit errors. This situation can be handled with the special distances defined in [1][8] in the special cases when there is no double-bit error which (a) could affect the correct bit indicated as erasure and (b) generates the same syndrome as the real error. This situation may only occur when shortened linear block SEC-DED codes are used since only sub-sets of the code word bits can be affected by the double-bit errors which correspond to certain syndromes. This is due to the fact that in a linear block SEC-DED code, a bit cannot be affected by two double-bit errors that generate the same syndrome. For example, if the SEC-DED code words have an odd number of bits, there is always at least one bit that cannot be affected by any double-bit that corresponds to a certain syndrome.

# IV. PROBABILISTIC ESTIMATION OF THE DOUBLE-BIT ERROR CORRECTION CAPABILITY

In this section, we estimate the probability to correct a double-bit error with the schemes proposed in Section III, under the assumptions that:

- an *n*-bit SEC-DED word with *k* data-bits and *r=n-k* check-bits is accessed during each memory read operation,
- the bits of the accessed words are affected by independent and identically distributed random errors,
- correct and erroneous bits are indicated as erasures with the probabilities *P<sub>corr</sub>* and *P<sub>err</sub>*, respectively.

The probability  $\Pi_{conv}$  that a double-bit error is corrected with the schemes sketched in Figure 4 and Figure 5 can be expressed as below:

$$\Pi_{conv} = \left[1 - (1 - P_{err})^2\right] (1 - P_{corr})^{n-2} + P_{err}^2 \left[(n-2)P_{corr}(1 - P_{corr})^{n-3}\right]$$

where the first term gives the error correction probability in the absence of false positives and the second term covers the case when there is only one false positive. In the former case, there should be at least one true positive while, in the latter case, there should be two true positives.

Double-bit errors cannot be corrected with the schemes presented in Figure 4 and Figure 5 in the presence of more than one false positive. Consequently, these schemes cannot handle double-bit errors if the encoded data words contain more than three erasures. The probability  $\Pi_{unconv}$  to correct a double-bit error with the scheme in Figure 6 can be expressed as follows given av, the average number of double-bit errors that generate the same syndrome in a SEC-DED code:

$$\Pi_{unconv} = 2P_{err} (1 - P_{err}) (1 - P_{corr})^{n-2} + P_{err}^2 (1 - P_{corr}^2)^{av-1}$$

where the first term involves only one true positive and the second takes into account two true positives. The first term requires the absence of false positives while the second term tolerates the presence of one false positive in any competing double-bit error that generates the same syndrome as the actual double-bit error.

For SEC-DED codes with fixed parity [12][13], the parameter av can be approximated as below:

$$av \cong \frac{n(n-1)/2}{2^{r-1}-1}$$

where the numerator represents the number of double-bit errors and the denominator is the number of syndromes used for double-bit error detection.

The probabilities  $\Pi_{conv}$  and  $\Pi_{unconv}$  are illustrated in Figure 7 and Figure 8, respectively, as functions of  $P_{err}$  for several values of  $P_{corr}$  and a SEC-DED code with 32 data-bits and 7 check-bits. As one could expect,  $P_{corr}$  and  $P_{err}$  have opposite impacts on the probabilities  $\Pi_{conv}$  and  $\Pi_{unconv}$ . Therefore, the effectiveness of the proposed solution depends on the ability to obtain a high  $P_{err}$  while maintaining the  $P_{corr}$  as low as possible.

 $\Pi_{unconv}$  is larger than  $\Pi_{conv}$  due to the fact that the scheme in Figure 6 can simultaneously handle several false positives. This property enables the correction of a large fraction of the double-bit errors in memories where erasure mechanisms are responsible for more than 90% of the total number of erroneous bits.

It comes out that for  $P_{corr}$  below 0.1% the proposed schemes have similar error correction capabilities. In this region,  $P_{corr}$  has very little impact on the probability to correct double-bit errors and  $P_{err}$  can be improved, by changing the reference values illustrated in Fig.1, Fig. 2 and Fig. 3, as long as  $P_{corr}$  is kept below 0.1%.

The occurrence probability of an erasure,  $P_{corr}+P_{err}$ , can be estimated from Figure 1 to Figure 3 by integrating the probability density functions over the domains associated with erasures. This gives a hint about the upper limit of  $P_{corr}$ , which is expected to be small in memories with reasonable storage cell reliability.

Given the values of the probabilities  $\Pi_{conv}$  and  $\Pi_{unconv}$ , the memory MTTF improvement can be calculated as proposed in [10].



Fig. 7. Probability  $\Pi_{conv}$  as a function of  $P_{err}$  for several values of  $P_{corr}$  and a SEC-DED code with 32 data-bits and 7 check-bits.



Fig. 8. Probability  $\Pi_{unconv}$  as a function of  $P_{err}$  for several values of  $P_{corr}$  and a SEC-DED code with 32 data-bits and 7 check-bits.

# V. SHORTENED LINEAR BLOCK SEC CODES WITH INCREASED DED CAPABILITY

The error correction schemes in Figure 4 to Figure 6 can be generalized to ECCs that can correct up to *m*-bit errors and detect completely or partially (m+1)-bit errors [23] in order to enable the correction of the detectable errors.

This section is devoted to the generation of shortened linear block SEC codes with the goal to increase the number of double-bit errors that can be detected and finally corrected with the help of slightly modified versions of the error correction schemes proposed in Section III. For example, in Figure 9 is sketched a version of the error correction scheme in Figure 4 which was adapted to a shortened SEC code. The DEC scheme in Figure 6 can also be adapted to a shortened SEC code just by restricting the number of syndromes available for double-bit error detection.



Fig. 9. Partial DEC scheme based on a shortened SEC code, erasure information, a conventional SEC decoder and a SEC-partialDED decoder.

Table I reports the numbers and ratios of detectable doublebit errors in shortened linear block SEC codes constructed with the approach proposed in Annex I. These numbers remain relatively close to the upper limit whose calculation is explained in Annex I. The probability to correct double-bit errors with the error correction scheme in Figure 9 and in Figure 6 (based on shortened SEC codes) can be calculated as in Section IV if the number of detectable double-bit errors and the parameter *av* are appropriately updated.

TABLE I. NUMBERS AND RATIOS OF DETECTABLE DOUBLE-BIT ERRORS IN THE GENERATED SHORTENED LINEAR BLOCK SEC CODES

| SEC      | Data  | Number of                        | Detectable double-bit errors |             |  |
|----------|-------|----------------------------------|------------------------------|-------------|--|
| code     | -bits | syndromes avail-<br>able for DED | Obtained number              | Upper limit |  |
| (12, 8)  | 8     | 3                                | 18 (27%)                     | 18 (27%)    |  |
| (21, 16) | 16    | 10                               | 90 (43%)                     | 100 (48%)   |  |
| (38, 32) | 32    | 25                               | 415 (59%)                    | 475 (68%)   |  |
| (71, 64) | 64    | 56                               | 1813 (73%)                   | 1960 (79%)  |  |

# VI. SHORTENED LINEAR BLOCK SEC-DED CODES WITH INCREASED TED CAPABILITY

Shortened linear block SEC-DED codes with a number of data-bits per code word equal to a power of 2 enable the detection of large fractions of triple-bit errors. If SEC-DED decoders with partial TED capability are used in Figure 4 and Figure 5, then all detectable triple-bit errors become correctable with an advantageous mix of true and false positive. An augmented version of the scheme in Figure 4 which enables partial triple-bit error correction (TEC) is illustrated in Figure 10.

With the error correction scheme in Figure 10, each triplebit error that can be detected with the shortened SEC-DED code is corrected if one of the following combinations of true and false positives occurs:

- 3 true positives and 0 false positives,
- 3 true positives and 1 false positive which will be handled by the SEC decoder,

• 2 true positives and 0 false positives since the silent erroneous bit will be masked by the SEC decoder.



Fig. 10. Partial TEC scheme based on a shortened SEC-DED code, erasure information, a conventional SEC decoder and a SEC-DED-partialTED decoder.

Here, we are especially interested in SEC-DED codes with fixed parity which enable an efficient implementation of double-bit error detection. Approaches have been proposed to construct parity-check matrices with odd-weight columns for shortened SEC-DED codes with fixed parity [9][13] and large numbers of detectable triple-bit errors [17]. These approaches require manipulations of the triple-bit errors and their syndromes [17].

We generated SEC-DED parity-check matrices with increased triple-bit error detection with an alternative approach which is only looking at the double-bit errors and their syndromes. According to the demonstration presented in Annex II, a maximal number of triple-bit errors can be detected if the double-bit errors are distributed uniformly over the syndromes responsible for their detection. Dealing with double-bit errors instead of triple-bit errors reduces the search space and enables faster search algorithms. In this way, more SEC-DED paritycheck matrices can be generated in order to provide more alternatives for the selection of an efficient hardware implementation.

#### VII. QUANTITATIVE EVALUATIONS

The interest of the error correction schemes presented here can be understood by looking at the information (storage) overhead of several conventional linear block SEC, SEC-DED and DEC codes reported in Table II. With respect to the DEC codes, the check-bit number of the SEC and SEC-DED codes is reduced by 50% and roughly 40%, respectively.

Latency and logic costs of the proposed error correction schemes are reported in Table III and Table IV. These costs are calculated with respect to the latency and logic size of hardwired dictionary-based decoders of linear bloc DEC codes. The costs of conventional SEC and SEC-DED decoders [12][13] are also reported. The logic synthesis was performed with Synopsys Design Compiler and the primary goal was the latency minimization since, for large memories, the relative size of the error correction logic is expected to be negligible.

The scheme in Figure 4 is always faster and smaller than a conventional DEC decoder. The scheme in Figure 9 has similar properties but remains slower than the scheme in Figure 4. This is due to the fact that double-bit error detection is faster in a SEC-DED code with fixed parity [13] than partial double-bit error detection in a shortened SEC code with the same number of data-bits.

The scheme in Figure 4 is also faster than the scheme in Figure 6. Excepting the case when SEC codes with 8 data-bits are used, the scheme in Figure 9 is also faster than the scheme in Figure 6. The schemes in Figure 4 and Figure 9 have also a lower logic size than the scheme in Figure 6.

With respect to a conventional DEC decoder, the latency overhead of the scheme in Figure 6 does not exceed 11% while its logic overhead stays below 40%.

Table V compares the latency and size of the fastest logic implementations of the schemes in Figure 10 and Figure 4. The impact on latency of partial TED increases with the number of data bits while its influence on the logic size remains limited. The numbers of triple-bit errors that can be detected and, potentially, corrected with the scheme in Figure 10 are identical to the numbers of triple-bit errors that can be detected with the odd-weight column SEC-DED codes generated in [17].

 TABLE II.
 MINIMAL
 INFORMATION
 OVERHEAD
 OF
 CONVENTIONAL

 LINEAR
 BLOCK
 SEC.
 DED AND DEC CODES
 DEC
 <tdD

| Data-bits | SEC code | SEC-DED code | DEC code |
|-----------|----------|--------------|----------|
| 8         | 50%      | 63%          | 100%     |
| 16        | 31%      | 38%          | 63%      |
| 32        | 19%      | 22%          | 38%      |
| 64        | 11%      | 13%          | 22%      |

TABLE III. LATENCY RATIO W.R.T. CONVENTIONAL DEC DECODERS

| Data-<br>bits | SEC code     |          |          | SEC-DED code |          |          |
|---------------|--------------|----------|----------|--------------|----------|----------|
|               | Conventional | Figure 9 | Figure 6 | Conventional | Figure 4 | Figure 6 |
| 8             | 79%          | 100%     | 96%      | 82%          | 99%      | 110%     |
| 16            | 80%          | 101%     | 104%     | 79%          | 98%      | 111%     |
| 32            | 77%          | 95%      | 106%     | 76%          | 90%      | 107%     |
| 64            | 74%          | 96%      | 107%     | 73%          | 87%      | 109%     |

TABLE IV. LOGIC RATIO W.R.T. CONVENTIONAL DEC DECODERS

| Data- | SEC code     |          |          | SEC-DED code |          |          |
|-------|--------------|----------|----------|--------------|----------|----------|
| bits  | Conventional | Figure 9 | Figure 6 | Conventional | Figure 4 | Figure 6 |
| 8     | 23%          | 51%      | 57%      | 25%          | 53%      | 126%     |
| 16    | 19%          | 38%      | 67%      | 20%          | 43%      | 115%     |
| 32    | 11%          | 26%      | 57%      | 13%          | 35%      | 126%     |
| 64    | 8%           | 17%      | 62%      | 9%           | 19%      | 138%     |

TABLE V. BENEFIT AND COST OF THE SCHEME IN FIGURE 10 W.R.T. THE FASTEST IMPLEMENTATION OF THE SCHEME IN FIGURE 4

| Data-<br>bits | Achieved number (ratio) of<br>detectable triple-bit errors<br>(that could become correctable) | Latency augmentation | Logic size augmentation |
|---------------|-----------------------------------------------------------------------------------------------|----------------------|-------------------------|
| 8             | 66 (23%)                                                                                      | 0%                   | 0%                      |
| 16            | 540 (35%)                                                                                     | 1%                   | 6%                      |
| 32            | 3,799 (42%)                                                                                   | 4%                   | 0%                      |
| 64            | 26,968 (45%)                                                                                  | 13%                  | 3%                      |

### VIII. CONCLUSIONS

Two error correction schemes were proposed for binary memories that can be affected by erasures. Double-bit error correction was enabled with the help of erasure information and SEC-DED codes. The probabilities that double-bit errors are corrected were expressed and evaluated as functions of the probabilities that correct and erroneous bits are indicated as erasures. It was shown that virtually all double-bit errors become correctable in memories protected by SEC-DED codes where at least 90% of the erroneous bits are identified as erasures. Approaches to construct shortened linear block SEC and SEC-DED codes were presented in order to increase the number of detectable double-bit errors and triple-bit errors, respectively. These errors can be corrected with the proposed schemes if half or the majority of the erroneous bits are indicated as erasures. Latency and logic costs of the proposed error correction schemes were estimated with respect to conventional SEC, SEC-DED and DEC decoders.

## ACKNOWLEDGMENT

We are very grateful to Ken Mackay and Jérémy Alvarez-Hérault from Crocus Technology, and to Nicolas Ravot from CEA for their valuable advices.

We would like to thank Michael Richter for his assistance to derive tighter limits in Annex II.

#### References

- D. Chase, "A class of algorithms for decoding block codes with channel measurement information," IEEE Transactions on Information Theory, volume IT-18, number 1, January 1972, pp. 170–182.
- [2] C.L. Chen and M.Y. Hsiao, "Error-correcting codes for semiconductor memory applications: a state of the art review," IBM Journal of Research and Development, volume 28, issue 2, 1984, pp. 124–134.
- [3] Z. Chishti, A.R. Alameldeen, C. Wilkerson, W. Wu, and S.-L. Lu, "Improving cache lifetime reliability at ultra-low voltages," IEEE/ACM International Symposium on Microarchitecture, MICRO-42, New York City, December 2009.
- [4] T.J. Dell, "A white paper on the benefits of chipkill-correct ECC for PC server main memory," IBM Microelectronics Division, 1997.
- [5] D.V. Dimitrov, Z. Gao, X. Wang, W. Jung, X. Lou, and O.G. Heinonen, "Dielectric breakdown of MgO magnetic tunnel junctions," Applied Physics Letters, 2009.
- [6] G. Dong, N. Xie, and T. Zhang, "On the use of soft-decision error-correcting codes in NAND flash memory," IEEE Transactions on Circuits and Systems, 58(2), February 2011, pp. 429–439.
- [7] G.D. Forney, Jr., "On decoding BCH codes," IEEE Transactions on Information Theory, volume IT–11, number 4, October 1965, pp. 549–557.
- [8] G.D. Forney, Jr., "Generalized minimum distance decoding," IEEE Transactions on Information Theory, volume IT-12, number 2, April 1966, pp. 125-131.
- [9] V. Gherman, S. Evain, N. Seymour, and Y. Bonhomme, "Generalized paritycheck matrices for SEC-DED codes with fixed parity," IEEE International On-Line Testing Symposium (IOLTS), 2011, pp. 200–203.
- [10] V. Gherman, S. Evain, and Y. Bonhomme, "Memory reliability improvement based on maximized error-correcting codes," Journal of Electronic Testing: Theory and Applications (JETTA), volume 29, issue 4, pp. 601-608, August 2013.
- [11] B. Godard, J.-M. Daga, L. Torres, and G. Sassatelli, "Hierarchical code correction and reliability management in embedded nor flash memories," IEEE European Test Symposium (ETS), 2008, pp. 84–90.
- [12] R.W. Hamming, "Error correcting and error detecting codes," The Bell System Technical Journal, volume 29, April 1950, pp. 147–160.
- [13] M.Y. Hsiao "A class of optimal minimum oddweight-column SEC-DED codes," IBM Journal of Research and Development, volume 14, 1970, pp. 395–401.
- [14] D. Kim, T. Kim, S. Kim, J.H. Kong, Y. Yu, and K. Char, "Evolution of Electrical Properties of Magnetic Tunnel Junction through Successive Dielectric Breakdowns," Japanese Journal of Applied Physics, volume 42, part 1, number 3, March 2003, pp. 1242–1245.

- [15] S. Lin and D. J. Costello, "Error control coding: fundamentals and applications," Prentice-Hall, Inc., Englewood Cliffs, NJ, 1983.
- [16] G. Panagopoulos, C. Augustine, and K. Roy, "Modeling of dielectric breakdown-Induced time-dependent STT-MRAM performance degradation," Device Research Conference, 2011, pp. 125–126.
- [17] M. Richter, K. Oberlaender, and M. Goessel, "New linear SEC-DED codes with reduced triple error miscorrection probability," IEEE International On-Line Testing Symposium (IOLTS), 2008, pp. 37–42.
- [18] V. Savin, "Self-corrected min-sum decoding of LDPC codes," IEEE International Symposium on Information Theory, 2008, pp. 146–150.
- [19] C.H. Stapper and H.-S. Lee, "Synergistic fault-tolerance for memory chips," IEEE Transactions on Computers, volume 41, issue 9, September 1992, pp.1078–1087.
- [20] D.D. Tang, and Y.-J. Lee, "Magnetic memory: fundamentals and technology," Cambridge University Press, ISBN: 0521449642, 2010, pp. 96-98.
- [21] J. Wang, T. Courtade, H. Shankar, and R.D. Wesel, "Soft information for LDPC decoding in flash: mutual-information optimized quantization," IEEE Global Telecommunications Conference, 2011, pp. 1–6.
- [23] M.-I. Weng and L.-N. Lee, "Weighted erasure codec for the (24,12) extended Golay code," United States Patent, 4397022, August 2, 1983.

#### ANNEX I

A linear block SEC code can be defined with the help of a binary matrix, called parity-check matrix or *H*-matrix, such that a binary column vector *V* is a code word if and only if it fulfils the relation below [15]:

# $H \cdot V = 0$

Each *H*-matrix column corresponds to a particular bit position in the code words. The number of *H*-matrix rows is equal to the check-bit number. During the decoding process of a vector V, a syndrome S is calculated with the expression below:

$$S = H \cdot V \tag{1}$$

If *S* is an all-zero syndrome, the binary vector *V* is assumed to be an error-free code word. A single-bit error generates a syndrome identical to the *H*-matrix column that corresponds to the corrupted bit position while a double-bit error produces a syndrome equal to the bitwise modulo-2 sum of the *H*-matrix columns that correspond to the erroneous bit positions. SEC capability can be ensured if the columns of the *H*-matrix are different from each other and from the all-zero column. The maximum number of bits *n* in a SEC code word is  $2^r$ -1 if *r* is the check-bit number.

Usually, the number of data-bits accessed in a memory read or write operation is a power of 2. As a result, shortened SEC codes are used in which the *H*-matrix columns do not exploit all possible combinations of *r*-bit vectors. In such a case, *r*-bit vectors which are different from the *H*-matrix columns and the all-zero vector can be used as syndromes for double-bit error detection (DED) [17].

In a linear block SEC code, all bit positions corrupted by two different double-bit errors that generate the same syndrome are necessarily different. Consequently, the maximum number of double-bit errors that can generate the same syndrome is given by the floor function  $\lfloor n/2 \rfloor$ . The product between this number and the number of syndromes available for DED,  $(2^r-1-n)$ , can be used as an upper limit for the number of double-bit errors that can be detected with a shortened SEC code.

Each syndrome  $S_i$  available for DED can be used to define a linear function  $F_{Si}$  on the syndrome space  $\{0,1\}^r$  as below:

$$F_{S_i}: \{0,1\}^r \rightarrow \{0,1\}^r, \quad F_{S_i}(X) = S_i \oplus X$$

where the symbol ' $\oplus$ ' stands for the bitwise modulo-2 sum.

Maximum DED capability implies that  $F_{Si}$  maps a maximum number of syndromes X used for SEC among each other. Since  $F_{Si}$  is a bijection, it will also map a maximum number of syndromes available for DED among each other. The latter syndromes can be grouped in *zero-sum triples* ( $S_i$ ,  $S_j$ ,  $S_l$ ) that contain the  $S_i$  syndrome as follows:

$$F_{Si}(S_i) = S_i \iff S_i \oplus S_i \oplus S_l = 0$$

A maximum DED capability requires a maximum number of such triplets. Moreover, this property should be imposed to any linear function  $F_{Sj}$  defined as above with the help of each syndrome  $S_j$  available for DED. Consequently, a maximum number of *zero-sum triples* should be formed by all the syndromes available for DED.

We use a greedy algorithm to find  $2^r$ -1-n r-bit vectors which are different from the all-zero syndrome and define as many zero-sum triplets as possible. The remaining non-zero r-bit vectors are used to fill the H-matrix columns for an n-bit SEC code. Subsequently, the H-matrix density can be reduced by performing linear operations on the H-matrix lines such that the resulting H-matrix defines the same SEC code [10].

#### ANNEX II

Consider a shortened linear block SEC-DED code with k data-bits and r check-bits (n=k+r) per code word. Consider also that all code words have the same parity [9][12][13]. During the decoding process, r-bit vectors, called syndromes, are calculated as shown in (1). One can identify the following sets of syndromes:

- 1. a set composed of the all-zero syndrome used to identify error-free code words,
- 2. a SEC-set which contains all syndromes used for single-bit error correction,
- a DED-set which contains all syndromes used for double-bit error detection i.e. which is disjoint from the first two sets,
- 4. a TED-set which contains all syndromes that can be used for triple-bit error detection i.e. which is disjoint from the first two sets.

Due to the fixed parity of the SEC-DED code, the DED and TED sets are also disjoint.

With respect to each syndrome  $S_j$  in the DED-set, the SEC-set can be partitioned into two sub-sets.

 a sub-set of syndromes S<sub>i</sub> that correspond to single-bit errors which affect bit positions that can also be affected by a double-bit error identified by the syndrome S<sub>i</sub>, • a sub-set of syndromes *S*<sub>l</sub> that correspond to single-bit errors which affect bit positions that cannot be affected by any of the double-bit errors identified by the syndrome *S*<sub>j</sub>.

By definition, the bitwise modulo-2 sum between the syndromes  $S_l$  and  $S_j$  is distinct from any syndrome that corresponds to a single-bit error. It is also different from the all-zero syndrome and any syndrome that corresponds to a double-bit error due to the code word fixed parity. Consequently, this sum corresponds to the syndrome of a detectable triple-bit error. Given  $x_j$ , the number of double-bit errors that can be detected by the syndrome  $S_j$ , there are  $n-2x_j$  single-bit errors which affect bit positions that cannot be affected by a double-bit error identified by the syndrome  $S_j$ . Hence, there are  $x_j(n-2x_j)$  triplebit errors that can be detected by a syndrome equal to a combination of  $S_l$  and  $S_j$  syndromes.

The total number of detectable triple-bit errors can be expressed as below:

$$\#TED = \sum_{DED-set} \frac{x_j (n-2x_j)}{3} = \frac{n^2 (n-1)}{6} - \frac{2}{3} \sum_{DED-set} x_j^2 \qquad (2)$$

where the sum over  $x_j$ 's is replaced by n(n-1)/2 and the divisor 3 is introduced to account for the fact that a triple-bit error can be generated by three different combinations of a double-bit error and a single-bit error.

Relation (2) shows that the number of detectable triple-bit errors can be maximized if the square sum is minimized. Since the sum over  $x_j$  is constant, the square sum can be minimized by (a) increasing the number of syndromes in the DED-set and (b) balancing the  $x_j$  values.

In the case of linear block SEC-DED codes with fixed parity, the first possibility is not an option as the DED-set cardinality is  $2^{r\cdot 1}$ -1. Hence, the upper limit of (2) can be calculated by replacing the  $x_j$ 's with integer values which are as balanced as possible. The obtained upper limits, which are the complementary of the lower limits illustrated in Table VI, are similar to the limits reported in [17].

Expression (2) can also be applied to linear block SEC-DED codes without fixed parity. In such a case, the DED and TED sets are not necessarily disjoint and the cardinality of the cardinality of the DED-set can be increased in order to improve the upper limit of (2). Explorations of such an enlarged search space have been performed in [17] and are beyond the scope of the present study.

Data-bits According to (2) According to [17] 220 (77%) 8 -999 (65%) 1,000 (65%) 16 32 5,324 (58%) 5,324 (58%) 32,600 (54%) 32,600 (55%) 64 128 220,728 (53%) 220,728 (53%)

TABLE VI. LIMITS FOR THE MINIMAL NUMBERS OF UNDETECTABLE (MISSCORRECTED) TRIPLE-BIT ERRORS IN SHORTENED SEC-DED CODES WITH FIXED PARITY