D2.1 – Circuit level fault models for sub-powered CMOS circuits for uncorrelated and correlated errors, January 2014
This deliverable presents an overview of the activities carried out within the Work Package 2 framework during the first year of the i-RISC project. These activities include circuit level analysis of sub-powered circuit components (combinational gates, memory elements, and interconnects) in deep submicron technologies, fault modeling for uncorrelated errors, as well as gate level fault injection for probabilistic fault models. We have demonstrated the fault probabilistic nature in CMOS circuits operating at sub- and near-threshold regimes. Furthermore, we have emphasized the correct output probability value dependence on circuits’ inputs combination, i.e. the data dependency nature of probabilistic faults in sub-powered CMOS circuits. The results of this first phase have been further expanded into three directions: a) Probabilistic Hardware Description Language based simulated fault injection for gate level net-lists; b) Inverse Gaussian distribution delay model for sub-powered CMOS logic gates, suitable for developing fast reliability evaluation methods; and c) Markov chains based symbolic reliability analysis methodology.
Download
D3.1 – Fault tolerant LDPC encoding and decoding, January 2014
This deliverable presents an overview of the activities carried out by the work package WP3 during the first year of the project. These activities include mainly the study of iterative decoding under faulty hardware, in order to understand the limits of Fault tolerance techniques based on LDPC forward error correction. In order to protect the different parts of the chips from transient defects, the storage and computation units have to be redundant and incorporate a powerful error-correction technique. In this deliverable, we focus on the class of LDPC codes, decoded with iterative message passing decoders, and our goal is the design of LDPC codes with fault-tolerant encoder and decoder architectures. Our contribution during this first period of the project has been to develop theoretical analysis and to propose practical decoding algorithms, which are tolerant to transient errors, coming from the faulty hardware. Error correcting codes with fault tolerant encoder and decoder architectures constitute a building block of our approach to fault tolerant chip design.
Download
D4.1 – Taylor-Kuznetsov memory architectures using structured LDPC codes, January 2014
This deliverable presents an overview of the activities carried out by the work package 4 (WP4) during the first year of the project. These activities include performance evaluation of Gallager B decoding algorithm and Taylor-Kuznetsov memory architecture based on structural LDPC codes, under independent and data-dependent logic gate failures. The analytical expression for bit error rate of one-step majority logic decoders built from unreliable logic gates is derived, which enable faster analysis of regular LDPC codes compared to Monte Carlo simulation. Also, we investigate novel approaches for designing low complexity decoders and codes with good performance in error-floor region.
Download
D5.1 – Data structures and design flow for fault tolerant circuit synthesis, January 2014
This report presents an overview of the work carried out in relation to Work Package 5 (WP5) during the first year of the project. The document reports on the investigation and selection of data structures to be used in the context of efficient fault tolerant circuit synthesis, on the development of design and validation flows for fault tolerant circuit synthesis, analysis and optimization, and on activities aimed to develop error coding driven graph augmentation a first step towards a Boole-Shannon type bound for circuit design.
Download
D2.2 – Higher abstraction fault models and their simulation methodology, November 2014
This deliverable presents the overview of the main results and activities carried out within the Work Package 2 (WP2) framework during the Month 13 to Month 21 (M13-M21). The main contributions included in this deliverable are related to: (i) comprehensive delay estimation models for timing analysis in CMOS circuits (Task 2.1), (ii) correlated errors modeling and degradation quantification for Probability Density Function (PDF)-based circuit reliability assessment (Task 2.3), (iii) probabilistic gate level/Register Transfer Level (RTL) fault models for interconnects (Tasks 2.1, 2.4), (iv) data dependent Simulated Fault Injection (SFI) for RTL circuit descriptions (Task 2.4), (v) cost effective FPGA fault emulation for probabilistic circuits (Task 2.4), and (vi) the envisaged sub-powered CMOS circuits energy modeling methodology and some preliminary energy evaluation results.
Download
D3.2 – Fault tolerant LDPC encoding and decoding, November 2014
This deliverable presents the overview of the main activities carried out within the Work-Package 3 (WP3) framework, during the period Month 13 to Month 21 (M13-M21) of the project. The main contributions included in this deliverable are related to the design and analysis of fault tolerant encoder and decoder architectures.
Concerning the fault tolerant LDPC decoders, in this deliverable we build on the results of the first year in order to design Finite Alphabet Iterative Decoders robust to hardware errors. We further investigate the robustness of the Stochastic decoder under faulty hardware, and propose a new Probabilistic Gradient Descent Bit-Flipping decoder, which is shown to provide increased error correction performance compared to standard bit-flipping algorithms, and to be robust to hardware errors.
Concerning the fault tolerant LDPC encoders, we evaluate the robustness of several encoding solutions and particular code constructions proposed in the literature to reduce the encoding complexity. As most of these solutions prove not to be robust to hardware errors, we further propose two new encoding solutions, which present increased robustness to hardware errors.
Download
D4.2 – Multi-bit flipping and fast-iterative decoders, January 2015
This deliverable presents an overview of the activities carried out within the Work Package 4 (WP4) during the period Month 13 to Month 24 (M13-M24) of the project. These activities mainly include the analysis and design of low complexity LDPC decoders and corresponding memory architectures, robust to hardware unreliability.
A special attention is devoted to the decoder analysis under data-dependent logic gate failures. We propose memory architectures that build on the one-step majority logic decoder, and develop analytical tools to evaluate their robustness. We also investigate bit-flipping decodes built in part from unreliable components, and show they can tolerate a fixed fraction of error. An improved version of the probabilistic gradient descent bit-flipping algorithm is also proposed, whose performance under unreliable hardware is shown to be superior compared to other known hard decision algorithms. In addition, we prove the existence of memory architectures that achieve an arbitrary small probability of failure under the data-dependent failure model, which represents the first such result for failure models other than the von Neumann model.
Moreover we started investigations on reliable data transport structures and this deliverable also presents preliminary steps towards the construction of energy effective reliable in-chip interconnects. To this end we targeted the Near/Sub-Threshold operating region and proposed a dual-rail interconnection strategy, which outperforms the single-rail counterpart in terms of energy consumption at the expense of some area overhead.
Download
D5.2 – Report on fault tolerant synthesis through error correcting codes driven graph augmentation, January 2015
This deliverable presents an overview of the work carried out in relation to Work Package 5 (WP5) during the second year of the project. The document reports all the activities aimed to develop error coding driven graph augmentation (Task 5.3), a first step towards fault-tolerant circuit design. A novel method to design fault tolerant circuitry with prime focus on improving the fault tolerance capability of combinatorial logic by means of error correction codes is discussed. Mathematical analysis on the Boole Shannon limit (Task 5.5) for linear circuits is also articulated. Different methodologies aimed at multi objective optimization (Task 5.4) are detailed, explaining the combinational circuit optimization techniques for different constraints like reliability, power & delay.
Download
D6.1 – Report on Reliability Aware Synthesis and LDPC Decoders Built with Unreliable Components, January 2015
This deliverable presents an overview of the work carried out in relation to Work Package 6 (WP6) during the second year of the i-RISC project. The first part of the deliverable reports on the implementation of a reliability aware synthesis tool and its evaluation on a set of benchmark circuits (Task 6.3). In particular, we propose an integrated design flow, which combines all the up to date developed i-RISC custom tools together with widely used tools in the circuit design industry. The second part of the deliverable reports on the fault tolerance assessment of state-of-the-art and i-RISC proposed LDPC decoders (Task 6.1). A number of seven LDPC decoders are implemented in VHDL/Verilog and are exposed to external aggression via voltage scaling or simulated fault injection. The error correction performance of the implemented decoders is evaluated for different aggression profiles, substantiating their fault-tolerance capabilities and the resulting benefits in terms of energy consumption and throughput.
Download
D2.3 – Energy models of sub-powered CMOS circuits, July 2015
This report presents an overview of the activities carried out in the Work Package 2 (WP2) framework the third i-RISC reporting period. The main contributions included in this deliverable are: a hierarchical methodology for energy modeling of faulty CMOS circuits (Tasks 2.1, 2.5) and a simulation based energy evaluation framework for on-chip interconnects (Task 2.1, 2.5). Furthermore, developments and enhancements of the multi-level simulated fault injection for probabilistic timing errors, and probabilistic FPGA fault emulation (Task 2.4) are included.
Download
D3.3 – Fault tolerant LDPC encoding and decoding, July 2015
This deliverable presents an overview of the activities carried out by the work package WP3 during the third year of the project. These activities include mainly the design and analysis of fault tolerant encoder and decoder architectures. In order to protect the different parts of the chips from transient defects, the storage and computation units have to be redundant and incorporate a powerful error-correction technique. In this work-package, we focus on the class of LDPC codes, decoded with iterative message passing decoders, and our goal is the design of LDPC codes with fault-tolerant encoder and decoder architectures. Our contributions during this third year of the project have been to design to analyze LDPC decoders on faulty hardware under more realistic error models and to design fault-tolerant LDPC encoders. Error correcting codes with fault tolerant encoder and decoder architectures constitute a building block of our approach to fault tolerant chip design.
Download
D4.3 – Assessment of Memory Architecture Tolerance to Correlated Errors and On-Chip Reliable Data Transport, October 2015
This deliverable presents an overview of the activities carried out during Month 25 to Month 33 (M25-M33) project period within the Work Package 4 (WP4) framework. We mainly investigate low complexity hard-decision decoders and corresponding memory architectures under uncorrelated and data-dependent gate failures and codec augmented interconnects that can enable energy effective reliable data transport. Related to faulty Gallager B decoder we demonstrate that randomness added through hardware unreliability can improve the overall decoder performance, and we develop theoretical framework that allows for iterative decoding process dynamics taking. Related to the Taylor-Kuznetsov memory architecture we: (i) provide a novel reliability analysis analytical method able to predict the reliability of the memory architecture for a fixed refresh time, codec parameters and (ii) improve the Deliverable D.4.2 results regarding the conditions in which we can guarantee that a number of worst-case component failures can be tolerated by the bit-flipping decoder and corresponding memory architecture, under data-dependent gate failures. We also propose a fault resilient 3D polyhedral memory general architecture and framework, which makes use of an LDPC based scrubbing approach to perform user transparent memory maintenance. Our experiments indicate that for the same redundancy requirements our approach outperforms the state of the art Hamming code based counterpart in terms of error correction capability and has very limited implications on memory performance and availability. Finally, we introduce codec based techniques to improve data transport, i.e., diminish the delay and energy consumption and increase fault resilience, within computation platforms.
Download