Skip to content

[CP 1471] feat(rbac): grant metrics-exporter events:create for profiler-disable…#561

Merged
sajmera-pensando merged 1 commit into
ROCm:mainfrom
ci-penbot-01:CP.O2O.pensando.gpu-operator.1471.rocm.gpu-operator.main
May 28, 2026
Merged

[CP 1471] feat(rbac): grant metrics-exporter events:create for profiler-disable…#561
sajmera-pensando merged 1 commit into
ROCm:mainfrom
ci-penbot-01:CP.O2O.pensando.gpu-operator.1471.rocm.gpu-operator.main

Conversation

@ci-penbot-01
Copy link
Copy Markdown
Contributor

cp of pensando/gpu-operator#1471


Source PR Description (pensando/gpu-operator#1471):

…d K8s warnings

Add events:create to the metrics-exporter ClusterRole so DME can emit K8s Warning events when the GPU profiler is auto-disabled (KUBE-16).

Mirrors DME PR pensando/device-metrics-exporter#1306.

$ sudo k3s kubectl get pod default-metrics-exporter-gqpb9 -n kube-amd-gpu \
    -o jsonpath='{.spec.serviceAccountName}'      
amd-gpu-operator-metrics-exporter
$   sudo k3s kubectl get events -n kube-amd-gpu --field-selector reason=ProfilerDisabled
LAST SEEN   TYPE      REASON             OBJECT                               MESSAGE
4m27s       Warning   ProfilerDisabled   pod/default-metrics-exporter-gqpb9   GPU profiler metrics (gpu_prof_*) disabled: rocpctl process core dumped/aborted. Restart the pod to re-enable.

Cherrypick triggered by: ACP-Automation

…d K8s warnings (#1471)

Add events:create to the metrics-exporter ClusterRole across all install
paths so DME can emit K8s Warning events when the GPU profiler is
auto-disabled (KUBE-16).

Updated locations:
- helm-charts-k8s/templates/metrics-exporter-rbac.yaml (Helm/K8s)
- hack/k8s-patch/template-patch/metrics-exporter-rbac.yaml (Helm patch source)
- config/rbac/metrics_exporter_cluster_role.yaml (kustomize/OLM source)
- bundle/manifests/amd-gpu-operator.clusterserviceversion.yaml (OLM CSV)

Mirrors DME PR pensando/device-metrics-exporter#1306.

(cherry picked from commit f1575d751b7675360c9728e44b86d3ec0a8aec55)
Copy link
Copy Markdown
Contributor

@spraveenio spraveenio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sajmera-pensando sajmera-pensando merged commit cbe6ed9 into ROCm:main May 28, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants