Design and Evaluation of a Fault-Tolerant Reconfigurable Architecture for Mission-Critical Embedded Systems
DOI:
https://doi.org/10.31838/RCC/03.02.03Keywords:
Fault-Tolerant Architecture, Dynamic Partial Reconfiguration (DPR), Field-Programmable Gate Array (FPGA), Mission-Critical Embedded Systems, Runtime Recovery, Built-In Self-Test (BIST), Soft Error Mitigation, Xilinx Zynq SoC, Hardware Redundancy, Reconfigurable ComputingAbstract
High reliability is required of mission-critical embedded systems that work in aerospace, medical and security applications and must guarantee fault tolerance and operate continuously, regardless of environmental conditions (harsh or unpredictable). The design and assessment of one new fault-tolerant reconfigurable architecture using the Dynamic Partial Reconfiguration (DPR) capabilities of FPGA platforms is presented in this paper to schedule the executions of the tasks such that they jointly meet the requirements of detecting, isolating, and recovering the hardware faults in real-time. The proposed architecture will incorporate a hybrid fault detection policy, which is a combination of Built-In Self-Test (BIST) and Cyclic Redundancy Check (CRC) into a form of reconfiguration controller that will reload partial bitstreams dynamically into the damaged areas of the FPGA. Runtime checkpointing and a low-latency recovery strategy are also factored into the system to reduce as much downtime as possible. The proposed solution is experimented and tested on a Xilinx Zynq-7000 SoC platform and fault injection to simulate soft error. The experimental data show a fault recovery rate of over 93%, a reconfiguration latency of less than 4.3 ms and a minimal area and power overhead, which validate the appropriateness of the architecture in safety networked outcomes in embedded systems. These experiments highlight the promisingness of DPR-based fault tolerance with respect to scalability and resource use and as an alternative to the traditional redundancy-based approaches.