Add SLES 16.0 support and extend SLES 15 SP7 driver versions#562
Open
Priyankasaggu11929 wants to merge 6 commits into
Open
Add SLES 16.0 support and extend SLES 15 SP7 driver versions#562Priyankasaggu11929 wants to merge 6 commits into
Priyankasaggu11929 wants to merge 6 commits into
Conversation
… match selector NoMatchingNodes condition self-heals, once nodes appear and a reconcile succeeds, the existing `DeleteErrorCondition + SetReady=True` path clears it.
Contributor
Author
|
cc: @yansun1996 for your review. Please note, these changes are validated on a MI355X system with SLES 16.0 (provided by AMD) across all amdgpu driver versions included in the PR. |
Member
|
Hi @Priyankasaggu11929 thanks for the PR, we're raising CI test against your change, will let you know if anything requires further change |
Contributor
Author
Ack, thank you @yansun1996! |
yansun1996
reviewed
Jun 1, 2026
yansun1996
reviewed
Jun 1, 2026
yansun1996
reviewed
Jun 1, 2026
yansun1996
reviewed
Jun 1, 2026
yansun1996
reviewed
Jun 1, 2026
Member
|
Hi @Priyankasaggu11929 , basic CI test passed, it would be better to handle the above comments before we merge this PR, thanks ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Follow up to #365 to extend SLES 16.0 support, and some controller validation improvements.
Technical Details
The PR is making the folllowing changes:
extend SLES support to include
SLES 16.0OS (along with the existingSLES 15 SP7)31.10,31.20,31.3030.20.1,30.30.3,31.10,31.20,31.30internal/utilspackage(so, both KMM module builder and the spec validator use the same meta)
add reconcile-time validation to reject unsupported driver versions for SLES nodes with a clear status condition on the DeviceConfig object
also add a new
NoMatchingNodescondition in the DeviceConfig status, to surface error when no cluster nodes match the node selector defined in the DeviceConfig manifest.This condition will clear when node selector condition is satisfied.
In my local testing, when NFD labels (for allowing GPU device ID) are absent on the node, the DeviceConfig admission happens silently with no error/feedback in the DeviceConfig status. but the operator loops indefinitely with reconciler errors in logs (and these errors are easy to go unnoticed).
Test Plan
Please note, these changes are validated on a MI355X system with SLES 16.0 (provided by AMD) across all amdgpu driver versions included in the PR.
Test Result
Added unit tests for
SLESDefaultDriverVersionsMapper(covering SP7 and 16.0 OS image parsing)ValidateSLESDriverVersion(to cover valid/invalid driver versions for both codestreams and non-SLES passthrough)resolveDockerfile(covering SLES 16.0 prebuilt image tag resolution in both default and custom registry scenarios)truncated output of
make unit-testSubmission Checklist