A Review of FPGA-Driven LLM Acceleration

Published in 2025 IEEE 18th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2025

Recommended citation: Y. Fu, J. Li, C. Cheng, S. L. Ma, C. -W. Sham and N. Zou, "A Review of FPGA-Driven LLM Acceleration," 2025 IEEE 18th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), Singapore, Singapore, 2025, pp. 49-52 https://ieeexplore.ieee.org/document/11310915

This paper provides a brief review of FPGA-based acceleration strategies for large language models (LLMs). As LLMs continue to increase in scale and complexity, efficiently deploying these models presents significant challenges, particularly in scenarios constrained by computational resources and memory bandwidth. This study highlights the unique advantages of Field Programmable Gate Arrays (FPGAs), including reconfigurable logic, fine-grained parallelism, and superior energy efficiency, which make them efficient for accelerating LLMs. Key findings indicate that optimizing bandwidth utilization is crucial for deploying larger models and achieving higher throughput. Furthermore, the review explores advanced optimization techniques from both model-layer and algorithm-layer perspectives, including sparsity quantization, memory access and bandwidth optimization strategies. These techniques improve data access patterns and alleviate FPGA memory limitations through innovative offloading strategies and the effective utilization of high-bandwidth memory technologies. Additionally, we have curated and organized the collected literature, which is available for public access and review on our github repository: https://github.com/FYL-Lib/FPGA.git. Recommended citation:

@INPROCEEDINGS{11310915, author={Fu, Yulin and Li, Jiale and Cheng, Cheng and Ma, Sean Longyu and Sham, Chiu-Wing and Zou, Nan}, booktitle={2025 IEEE 18th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)}, title={A Review of FPGA-Driven LLM Acceleration}, year={2025}, volume={}, number={}, pages={49-52}, keywords={Quantization (signal);Reviews;Computational modeling;Bandwidth;Reconfigurable logic;Programmable logic arrays;Throughput;Field programmable gate arrays;Optimization;Software development management;FPGA;LLMs;Optimization Strategy}, doi={10.1109/MCSoC67473.2025.00018}}