Fault Detection and Correction for Advancing Reliability in Reconfigurable Hardware for Critical Applications
DOI:
https://doi.org/10.31838/RCC/02.03.04Keywords:
Critical Applications, Fault Detection;, Fault Tolerance;, Reconfigurable Hardware;, Reliability Enhancement;, System IntegrityAbstract
For critical applications, where system failures can result in severe conse
quences, the reliability of hardware components is of primal importance.
Field Programmable Gate Arrays (FPGAs) have become common reconfigu
rable hardware in these high stakes environments due to their flexibility and
performance advantages. Specifically in a radiation rich environment, the
susceptibility of these devices to faults necessitates robust fault detection
and correction techniques. In this article, we explore some of the most fun
damental approaches used to improve reliability of reconfigurable hardware
in high reliability application. While learning how to mitigate faults in FPGAs,
we will follow errors from the different sources, how different device archi
tectures respond different ways to them, and programs that help counteract
the possibility of this failure. On the single event upset (SEU) to the more
complex error patterns, we will explore how engineers are driving the limits
of reliability in harsh operating conditions. The discussion of user inserted
and embedded techniques for mitigation will explore both but focus on pre
senting a comprehensive current state of the art on fault tolerant reconfig
urable systems. We explore the details of error detection and correction,
and how these technologies are helping to build reliable computing for aero
space, defense, and other critical sectors where failure means loss of life.