Scalable Reconfigurable Architectures for Edge-AI: Balancing Performance, Power, and Partial Reconfiguration Overheads
Keywords:
Edge-AI, Dynamic Partial Reconfiguration (DPR), FPGA-based Accelerators, Energy Efficiency, Reconfiguration Overhead, Hardware-Software Co-design, Scalable Hardware Architectures, Inference Optimization.Abstract
The threat of deploying deep learning at the network edge requires hardware capable of being configurable to different neural network topologies to meet powerful and area constraints strictly. This article suggests a reconfigurable hardware architecture that can be used in Edge-AI applications and it is scalable by use of Dynamic Partial Reconfiguration (DPR) to create a flexible hardware platform that can switch specialised accelerators at runtime. Although DPR provides high functional density, as it utilises silicon area, it causes an additional tremendous amounts of temporal and energy overheads in bitstream loading, a factor that can impact real-time. We propose a workaround to this by presenting a tile-based framework which is modular and uses a smart configuration manager that uses bitstream compression and predictive prefetching to eliminate reconfiguration latency. The design enables smooth switching between tasks without the reconfigurable processing elements having to stop global system operation by separating the reconfigurable processing elements control logic, which is typically performed by hardware, with the hardware. A set of experimental performances demonstrated on a Xilinx Zynq UltraScale+ platform with the use of the industry-standard benchmarks, i.e., YOLOv8 and MobileNetV2, confirms that our architecture outperforms the traditional static hardware implementations in terms of the balanced efficiency between the throughput and energy performance. Comprehensive evaluation demonstrates that the stipulated prefocusing mechanism conceals up to 85 per cent of resettlement latency that makes the inherent multi-activity edge setting considerably more receptive. The proposed system can reduce the cost of reconfiguration which leads to an energy-delay product (EDP) of up to 25 percent improvement by optimising the trade-off between the costs of specialising the hardware and reconfiguring the hardware. These results confirm the practicality of the architecture to next-generation and resource-constrained edge devices with the need of high versatility and yet without violating the power envelopes of excessively strict power-budgets of battery-powered systems.