meta-nvidia: bump NVIDIA stack to 580.95.05 (Blackwell CC support) by kvinwang · Pull Request #56 · Dstack-TEE/meta-dstack

kvinwang · 2026-04-21T02:19:20Z

Summary

Adds NVIDIA driver 580.95.05 recipes for nvidia, nvidia-fabricmanager, and libnvidia-nscq. Required for RTX PRO 6000 Blackwell Server Edition (10de:2bb5) confidential compute under Intel TDX — the existing 570.172.08 driver predates Pro 6000 SE and bails out with GPU confidential compute capability is not enabled when loaded inside a TDX guest.

Yocto picks the highest available PV by default, so adding these recipes is enough to make builds use 580.95.05; the 570 recipes are kept for now.

What's in each recipe

nvidia_580.95.05.bb — same structure as 570 recipe, only SRC_URI[md5sum] / SRC_URI[sha256sum] updated.
nvidia-fabricmanager_580.95.05.bb — same as 570. The 580 archive ships extra gb300_* topology files; current do_install lists files explicitly so the new ones aren't picked up. Harmless for non-GB300 hosts; could be turned into a glob in a follow-up.
libnvidia-nscq_580.95.05.bb — 580 archive no longer ships bin/nscq-cli, so do_install and FILES:${PN} were trimmed accordingly.

Verification

Built dstack-nvidia-dev-rootfs with the new stack and deployed to a TDX guest with one RTX PRO 6000 Blackwell Server Edition passthrough on Ubuntu intel-tdx kernel 6.14.0-1008-intel + tdx-patched QEMU. Inside the guest:

NVRM version: NVIDIA UNIX Open Kernel Module 580.95.05
GPU 0: NVIDIA RTX PRO 6000 Blackwell Server Edition

==============NVSMI CONF-COMPUTE LOG==============
    CC State                   : ON
    Multi-GPU Mode             : None
    CPU CC Capabilities        : INTEL TDX
    GPU CC Capabilities        : CC Capable
    CC GPUs Ready State        : Ready
    Protected memory size      : 99461312 KiB
    Unprotected memory size    : 0 KiB
    Environment                : PRODUCTION

Fabric Manager version is : 580.95.05
/usr/lib/libnvidia-nscq.so.580.95.05

Real workload: PyTorch 2.11+cu128 sees sm_120, runs cuBLAS GEMM (~78 TFLOP/s FP32, ~115 TFLOP/s BF16 on a single Pro 6000) entirely inside the TDX guest with CC ON.

Test plan

bitbake nvidia succeeds
bitbake nvidia-fabricmanager libnvidia-nscq succeeds
bitbake dstack-nvidia-dev-rootfs succeeds; rootfs contains nvidia.ko@580.95.05, nv-fabricmanager reporting 580.95.05, libnvidia-nscq.so.580.95.05
TDX guest boots, NVIDIA module loads on RTX PRO 6000 Blackwell SE
nvidia-smi conf-compute -q shows CC State: ON
PyTorch CUDA workload runs end-to-end

Required for RTX PRO 6000 Blackwell Server Edition (10de:2bb5) Confidential Compute support: the 570.172.08 NVIDIA Open Kernel Module predates Pro 6000 SE and refuses to attach with "GPU confidential compute capability is not enabled" inside a TDX guest. 580.95.05 recognizes GB202 and brings the full TDX guest CC stack online (verified end-to-end: nvidia-smi conf-compute reports CC State=ON, GPU CC Capabilities=CC Capable, Protected memory=99461312 KiB; PyTorch CUDA workloads run inside the TD). The nscq archive at 580 no longer ships bin/nscq-cli, so the new recipe drops it from do_install and FILES.

Copilot

Pull request overview

Adds Yocto recipes for the NVIDIA 580.95.05 driver stack to enable confidential compute support on newer Blackwell-era GPUs (e.g., RTX PRO 6000 Blackwell Server Edition) while keeping existing 570.x recipes in place.

Changes:

Add nvidia_580.95.05.bb with updated 580.95.05 installer checksums/URL.
Add nvidia-fabricmanager_580.95.05.bb to package Fabric Manager 580.95.05.
Add libnvidia-nscq_580.95.05.bb, updated for the 580 archive layout (no nscq-cli).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
meta-nvidia/recipes-graphics/nvidia/nvidia_580.95.05.bb	New 580.95.05 driver recipe (runfile extraction + includes).
meta-nvidia/recipes-graphics/nvidia/nvidia-fabricmanager_580.95.05.bb	New 580.95.05 Fabric Manager recipe and install manifest.
meta-nvidia/recipes-graphics/nvidia/libnvidia-nscq_580.95.05.bb	New 580.95.05 NSCQ library recipe; install/package list adjusted to archive contents.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- nvidia-fabricmanager: install all files from share/nvidia/nvswitch/ via glob loop instead of an explicit list. The 580.95.05 archive ships new GB300 topologies (gb300_nvl72r{1,2}_c2g4_topology) that the old hard-coded list missed; future archives stay covered automatically. (Addresses Copilot review feedback on PR #56.) - dstack.conf: introduce NVIDIA_VERSION = "580.95.05" and pin PREFERRED_VERSION_{nvidia,nvidia-fabricmanager,libnvidia-nscq} from it so the kernel module ABI and userspace libs always move together.

kvinwang · 2026-04-21T02:30:51Z

Addressed @copilot-pull-request-reviewer feedback in 85e7e17 + added version pinning:

nvidia-fabricmanager_580.95.05.bb — replaced the explicit topology file list with a for f in ${S}/share/nvidia/nvswitch/*; do install …; done loop. Verified build picks up the new gb300_nvl72r{1,2}_c2g4_topology files that the old list was missing (29 topology files installed total).
Version pin (meta-dstack/conf/distro/dstack.conf):
```
NVIDIA_VERSION = "580.95.05"
PREFERRED_VERSION_nvidia                = "${NVIDIA_VERSION}"
PREFERRED_VERSION_nvidia-fabricmanager  = "${NVIDIA_VERSION}"
PREFERRED_VERSION_libnvidia-nscq        = "${NVIDIA_VERSION}"
```
So the three packages always bump together — kernel module ABI and userspace libs are paired. bitbake-getvar --value confirms all four resolve to 580.95.05. Future bumps only need to change NVIDIA_VERSION.

Copilot · 2026-04-21T02:31:43Z

Addressed @copilot-pull-request-reviewer feedback in 85e7e17 + added version pinning:

nvidia-fabricmanager_580.95.05.bb — replaced the explicit topology file list with a `for f in ${S}/sha...

Both changes are already in place at commit 85e7e17:

nvidia-fabricmanager_580.95.05.bb uses the glob loop (for f in ${S}/share/nvidia/nvswitch/*) instead of the explicit file list.
meta-dstack/conf/distro/dstack.conf has NVIDIA_VERSION = "580.95.05" with the three PREFERRED_VERSION_* variables pinning all packages together.

No further changes needed.

Copilot AI review requested due to automatic review settings April 21, 2026 02:19

Copilot started reviewing on behalf of kvinwang April 21, 2026 02:19 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread meta-nvidia/recipes-graphics/nvidia/nvidia-fabricmanager_580.95.05.bb Outdated

Copilot started work on behalf of kvinwang April 21, 2026 02:31 View session

Copilot finished work on behalf of kvinwang April 21, 2026 02:32

kvinwang merged commit be6feef into main Apr 21, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

meta-nvidia: bump NVIDIA stack to 580.95.05 (Blackwell CC support)#56

meta-nvidia: bump NVIDIA stack to 580.95.05 (Blackwell CC support)#56
kvinwang merged 2 commits intomainfrom
feat/nvidia-580.95.05

kvinwang commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

kvinwang commented Apr 21, 2026

Uh oh!

Copilot AI commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kvinwang commented Apr 21, 2026

Summary

What's in each recipe

Verification

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

kvinwang commented Apr 21, 2026

Uh oh!

Copilot AI commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants