Dongkuan Wu
Email: dongkuanwu@gmail.com · 12211229@mail.sustech.edu.cn
Tel: (+86) 15147183485
Supervisor: Hao Yu (SUSTech)
Focus: LLM compression (quantization, pruning), FPGA-based acceleration and edge deployment
Experience
Infix — Intern
Sep. 2025 — Present
Toolchain for Model Compression & VCU128 FPGA Deployment
- Built a reusable HW–SW co-design flow from offline GPTQ (per‑group) & parameter packing to Verilog kernel integration; ran DeepSeek‑Distill‑Qwen 2.5‑7B, Qwen 2‑7B, ChatGLM 2/3‑6B (GPTQ) on VCU128.
- Rearranged/packed quantized tensors into AXI‑Stream blocks to reduce runtime transposition; coordinated with Edge‑LLM framework for better on‑chip utilization and throughput.
2024 — 2025
One‑Shot Sparsity Mask Rebuilder (LLM‑Barber)
- Proposed a block‑aware reconstruction that integrates sparsity across Self‑Attention and MLP for globally optimized pruning; extensive experiments across models/settings.
2024 — 2025
Rotation‑driven Mixed‑Precision Quantization
- Partitioned each layer’s weights/activations and applied orthogonal rotations to flatten outliers; enabled fine‑grained mixed‑precision with preserved end‑to‑end accuracy.
Mar. 2025 — Jun. 2025
Publications
LLM‑Barber: Block‑Aware Rebuilder for Sparsity Mask in One‑Shot for Large Language Models
Yupeng Su, Ziyi Guan, Xiaoqun Liu, Tianlai Jin, Dongkuan Wu · arXiv:2408.10631
Yupeng Su, Ziyi Guan, Xiaoqun Liu, Tianlai Jin, Dongkuan Wu · arXiv:2408.10631
2024 — 2025
APTQ+: Attention‑FFN‑aware Post‑Training Quantization for a Layer‑wise LLM Accelerator on FPGA
Ziyi Guan†, Zhengfei Chen†, Dongkuan Wu, Ao Shen, Yifan Guo, Yupeng Su
Ziyi Guan†, Zhengfei Chen†, Dongkuan Wu, Ao Shen, Yifan Guo, Yupeng Su
2025
Education
Southern University of Science and Technology (SUSTech)
Sept. 2022 — Jun. 2026
University of California, San Diego (UCSD)
Mar. 2025 — Jun. 2025
Skills
Programming: Python, C/C++, Verilog, MATLAB
Lab/Domain: LLM compression (quantization, pruning), model deployment on edge devices; Cadence simulation & verification
Hardware/EDA: Xilinx VCU128; AXI‑Stream data packing; FPGA inference toolchain