Energy-Efficient VLSI Co-Design for Edge AI: Near-Memory Compute and Sub-8-Bit Quantization in Low-Power Embedded Systems

H.K. Mzeha; Nisha Milind Shrirao

doi:10.17051/JEEAT/01.03.03

Authors

H.K. Mzeha Electrical and Electronic Engineering Department, University of Ibadan Ibadan, Nigeria. Author
Nisha Milind Shrirao Department Of Electrical And Electronics Engineering, Kalinga University, Raipur, India. Author

DOI:

https://doi.org/10.17051/JEEAT/01.03.03

Keywords:

Edge AI, VLSI, near-memory compute, compute-in/near-memory, quantization, sub-8-bit, QAT, mixed precision, RISC-V, SRAM, dataflow, low-power embedded.

Abstract

The work focuses on real-time, always-on edge AI with constrained power/area on the interplay of minimizing data-movement and arithmetic energy. Our co-design of VLSI pairs near-memory compute (NMC) with mixed-precision quantization of less than 8 bits. The architecture unites an RV32 RISC-V control core, a weight-stationary NMC MAC array disposed related to multi-banked SRAM, and a compression-sensitive on-chip interconnect to reduce bandwidth and toggling. A quantization-aware training pipeline assignd 4-6 bit weights and 4-8 bit activations per layer using LSQ-style learnable scales and a mixed-precision search constrained by layer sensitivity and energy/bandwidth budgets. Based on 22-nm estimates and cycle-accurate RTL/functional models, the prototype is able to achieve 1.2 TOPS/W on MobileNetV2/CIFAR-10 and Visual Wake Words on convolutional subloads, with 1.7-3.1× energy savings relative to a 1-bit baseline with less than 1.2 percentage-point accuracy drop. As the savings are estimated as ~55-70 in percent of NMC (reduced SRAM traffic) and ~30-45 in percent of mixed-precision quantization, it proves that they have a complementary impact. These findings show that algorithm-architecture coupled design, namely NMC with sub-8-bit QAT, presents an achievable and realistic route to battery-viable, practical inference at napps/s on low-power embedded SoCs. This paper summarizes future directions of guidelines deployable in bit-width assignment, SRAM banking, and dataflow scheduling that match both the industry and US journal demands in relation to energy consumption, reliability, and repeatability.

Energy-Efficient VLSI Co-Design for Edge AI: Near-Memory Compute and Sub-8-Bit Quantization in Low-Power Embedded Systems

Authors

DOI:

Keywords:

Abstract

Additional Files

Published

Issue

Section

How to Cite