dflash

Star

Here are 6 public repositories matching this topic...

Tencent / AngelSlim

Star

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

audio eagle quantization diffusion vlm llm qwen speculative-decoding llm-compression hunyuan deepseek fp4 dflash

Updated Apr 29, 2026
Python

Sandermage / genesis-vllm-patches

Star

Runtime patches for vLLM — Qwen3.6 (27B int4 / 35B-A3B FP8) on consumer Ampere. 50+ patches: TurboQuant KV, MTP / DFlash / ngram spec-decode, FULL cudagraph, 256K-320K context. v7.64: P67 non-pow-2 GQA, Cliff 1 fix, 6 docs (FAQ/HARDWARE/CONFIGS/CLIFFS), Genesis Compat Layer.

Updated May 1, 2026
Python

AEON-7 / vllm-dflash

Star

DFlash vLLM for DGX Spark — Plug & Play Block-Diffusion Speculative Decoding

docker inference nvidia blackwell llm vllm qwen speculative-decoding block-diffusion nvfp4 dgx-spark dflash

Updated Apr 24, 2026
Python

hec-ovi / vllm-awq4-qwen

Star

vLLM Qwen 3.6-27B (AWQ-INT4) + DFlash speculative decoding on AMD Strix Halo (gfx1151 iGPU, 128 GB UMA, ROCm 7.13). 24.8 t/s single-stream, vision, tool calling, 256K context, OpenAI-compatible, Docker. Matches DGX Spark FP8+DFlash+MTP at a third of the cost. No CUDA.

docker rocm openai-api awq vllm llm-inference speculative-decoding multimodal-llm qwen3 gfx1151 ryzen-ai-max dflash amd-strix-halo rdna35 27b

Updated Apr 27, 2026
Python

cryptopoly / ChaosEngineAI

Sponsor

Star

Local AI workstation — discover, run, chat, benchmark, and generate images from open-weight models. DFlash/DDTree speculative decoding, five cache compression strategies (RotorQuant, TriAttention, TurboQuant, ChaosEngine), MLX + llama.cpp + vLLM backends.

Updated Apr 30, 2026
Python

croll83 / llama.cpp-dgx

Star

llama.cpp fork optimized for NVIDIA DGX Spark / GB10 (Blackwell, SM 12.1) — TurboQuant weights + KV, NVFP4, DFlash MTP

blackwell llama-cpp speculative-decoding gb10 nvfp4 dflash turboquant

Updated Apr 30, 2026
C++

Improve this page

Add a description, image, and links to the dflash topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dflash topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dflash

Here are 6 public repositories matching this topic...

Tencent / AngelSlim

Sandermage / genesis-vllm-patches

AEON-7 / vllm-dflash

hec-ovi / vllm-awq4-qwen

cryptopoly / ChaosEngineAI

croll83 / llama.cpp-dgx

Improve this page

Add this topic to your repo