I specialize in the research, optimization, and deployment of deep neural networks on resource-constrained embedded platforms — focusing on quantization (PTQ/QAT), sparsity, and efficient inference on edge hardware.
Languages: Python · C/C++ · VHDL
Frameworks: PyTorch · TensorFlow/Keras · ONNX · TFLite · OpenCV
Optimization: Quantization (PTQ/QAT) · Pruning · Sparsity · SIMD/NEON/AVX · TensorRT
Hardware: Qualcomm Snapdragon NPU · NVIDIA Jetson · Arm Cortex-A · RISC-V · Intel Cyclone V FPGA
| Project | Description |
|---|---|
| VAR-Compressor | Compression framework for Visual Autoregressive (VAR) generative models, validated on NVIDIA Jetson Orin. Developed during visiting period at ETH Zurich. |
| Mosaic-SR | Patent-pending super-resolution algorithm with custom NEON/AVX kernels achieving +30% industrial scanner range. Published at IEEE ICIP 2025. |
| BarBeR & BaFaLo | CNN architectures and benchmarking repository for barcode localization, optimized for Arm CPU edge platforms. Published at ICPR 2024 and Eng. App. of AI. |
| FOOT DaQ | VHDL data acquisition system for nuclear physics experiments on Intel Cyclone V FPGA. |
| Im2Col_SIMD | SIMD-optimized im2col for 2D tensors (AVX2 and NEON) developed as a core component of Mosaic-SR for accelerated patch extraction on Arm CPUs. Includes Python Ctypes wrapper. |
