meta-nvidia: bump NVIDIA stack to 580.95.05 (Blackwell CC support)#56
meta-nvidia: bump NVIDIA stack to 580.95.05 (Blackwell CC support)#56
Conversation
Required for RTX PRO 6000 Blackwell Server Edition (10de:2bb5) Confidential Compute support: the 570.172.08 NVIDIA Open Kernel Module predates Pro 6000 SE and refuses to attach with "GPU confidential compute capability is not enabled" inside a TDX guest. 580.95.05 recognizes GB202 and brings the full TDX guest CC stack online (verified end-to-end: nvidia-smi conf-compute reports CC State=ON, GPU CC Capabilities=CC Capable, Protected memory=99461312 KiB; PyTorch CUDA workloads run inside the TD). The nscq archive at 580 no longer ships bin/nscq-cli, so the new recipe drops it from do_install and FILES.
There was a problem hiding this comment.
Pull request overview
Adds Yocto recipes for the NVIDIA 580.95.05 driver stack to enable confidential compute support on newer Blackwell-era GPUs (e.g., RTX PRO 6000 Blackwell Server Edition) while keeping existing 570.x recipes in place.
Changes:
- Add
nvidia_580.95.05.bbwith updated 580.95.05 installer checksums/URL. - Add
nvidia-fabricmanager_580.95.05.bbto package Fabric Manager 580.95.05. - Add
libnvidia-nscq_580.95.05.bb, updated for the 580 archive layout (nonscq-cli).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| meta-nvidia/recipes-graphics/nvidia/nvidia_580.95.05.bb | New 580.95.05 driver recipe (runfile extraction + includes). |
| meta-nvidia/recipes-graphics/nvidia/nvidia-fabricmanager_580.95.05.bb | New 580.95.05 Fabric Manager recipe and install manifest. |
| meta-nvidia/recipes-graphics/nvidia/libnvidia-nscq_580.95.05.bb | New 580.95.05 NSCQ library recipe; install/package list adjusted to archive contents. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- nvidia-fabricmanager: install all files from share/nvidia/nvswitch/ via
glob loop instead of an explicit list. The 580.95.05 archive ships new
GB300 topologies (gb300_nvl72r{1,2}_c2g4_topology) that the old
hard-coded list missed; future archives stay covered automatically.
(Addresses Copilot review feedback on PR #56.)
- dstack.conf: introduce NVIDIA_VERSION = "580.95.05" and pin
PREFERRED_VERSION_{nvidia,nvidia-fabricmanager,libnvidia-nscq} from it
so the kernel module ABI and userspace libs always move together.
|
Addressed @copilot-pull-request-reviewer feedback in 85e7e17 + added version pinning:
|
Both changes are already in place at commit
No further changes needed. |
Summary
Adds NVIDIA driver 580.95.05 recipes for
nvidia,nvidia-fabricmanager, andlibnvidia-nscq. Required for RTX PRO 6000 Blackwell Server Edition (10de:2bb5) confidential compute under Intel TDX — the existing 570.172.08 driver predates Pro 6000 SE and bails out withGPU confidential compute capability is not enabledwhen loaded inside a TDX guest.Yocto picks the highest available
PVby default, so adding these recipes is enough to make builds use 580.95.05; the 570 recipes are kept for now.What's in each recipe
nvidia_580.95.05.bb— same structure as 570 recipe, onlySRC_URI[md5sum]/SRC_URI[sha256sum]updated.nvidia-fabricmanager_580.95.05.bb— same as 570. The 580 archive ships extragb300_*topology files; currentdo_installlists files explicitly so the new ones aren't picked up. Harmless for non-GB300 hosts; could be turned into a glob in a follow-up.libnvidia-nscq_580.95.05.bb— 580 archive no longer shipsbin/nscq-cli, sodo_installandFILES:${PN}were trimmed accordingly.Verification
Built
dstack-nvidia-dev-rootfswith the new stack and deployed to a TDX guest with one RTX PRO 6000 Blackwell Server Edition passthrough on Ubuntu intel-tdx kernel6.14.0-1008-intel+ tdx-patched QEMU. Inside the guest:Real workload: PyTorch 2.11+cu128 sees
sm_120, runs cuBLAS GEMM (~78 TFLOP/s FP32, ~115 TFLOP/s BF16 on a single Pro 6000) entirely inside the TDX guest with CC ON.Test plan
bitbake nvidiasucceedsbitbake nvidia-fabricmanager libnvidia-nscqsucceedsbitbake dstack-nvidia-dev-rootfssucceeds; rootfs containsnvidia.ko@580.95.05,nv-fabricmanagerreporting 580.95.05,libnvidia-nscq.so.580.95.05nvidia-smi conf-compute -qshowsCC State: ON