Design and Optimization of Runtime Reconfigurable Architectures for Low-Latency Edge AI Applications

Leila Ismail; Hee-Seob Kim

doi:10.31838/RCC/03.02.01

Authors

Leila Ismail Faculty of Management, Canadian University Dubai, Dubai, United Arab Emirates Author
Hee-Seob Kim Department of Electrical and Computer Engineering, Seoul National University, Seoul 08826, Korea Author

DOI:

https://doi.org/10.31838/RCC/03.02.01

Keywords:

Runtime Reconfigurable Architecture; Edge AI; Dynamic Partial Reconfiguration (DPR); Low-Latency Inference; FPGA Acceleration; Hardware/Software Co-Design; Reinforcement Learning Scheduler; Real-Time AI Processing; Energy-Efficient Edge Computing.

Abstract

The surrounding network edge has produced such a huge number of artificial intelligence (AI) applications that the demand to perform real-time in the strict latency, power consumption, and computing forms is growing. Conventional cloud-based methods of inference cannot effectively perform in such settings because of the latency and privacy issues associated with it. This paper suggests a new type of low-latency Edge AI-focused runtime reconfigurable architecture (RRA) to meet these challenges. The design is based on the dynamic partial reconfiguration (DPR) on FPGAs that allows hardware to be adapted dynamically on-demand depending on the shifting AI workloads in real-time. In contrast to the fixed hardware designs one can find in the current hardware designs, our RRA implements optimized hardware modules of various inference tasks, including convolution, activation, and pooling layers, and dynamically loads them into the hardware to maximize resource usage and reduce idle logic. For facilitating the challenging work of scheduling tasks in the system, a reinforcement learning task scheduler is integrated into the system and it works by predicting workload patterns and orchestrating reconfiguration events at low overheads. In addition, a performance-energy optimization layer is used so that when the architectural changes are made, they do not violate the energy budget or the energy budget on the edge device or its thermal constraints. Standard CNN benchmarking on the entire system is undertaken on Xilinx Zynq UltraScale+ MPSoC platform,and the standard CNN benchmarks include ResNet-18 and MobileNetV2. Experimental outcomes show that inference latency can be decreased by as much as 53 percent and power reduction by as much as 41 percent when compared to non-dynamically provisioned baseline designs. Both factors make the framework scalable to a wider range of neural workloads with negligible reconfiguration delays owing to overlapping task execution mechanisms and bitstream caching. The work presented is an indication of the practicality and success of runtime reconfigurable hardware on producing desirable, adaptive, energy-conserving, and high inference at the edge. The approach puts forward a new milestone towards narrowing down the disparity between AI algorithmic complexity and hardware constraints in tangible edge implementations leading to smart embedded systems in autonomous cars, security, health monitoring, and smart factory.

Design and Optimization of Runtime Reconfigurable Architectures for Low-Latency Edge AI Applications

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Latest publications

Information