OIDC/Keycloak authentication with RBAC and scope-based permissions

### Problem Statement

OpenShell currently authenticates CLI-to-gateway communication via mTLS client certificates or Cloudflare Access JWTs. It doesn't support standard enterprise identity providers (Keycloak, Entra ID, Okta) or fine-grained, per-method access control. Organizations that want to:

- Authenticate users against their existing identity provider
- Restrict automation tokens to specific operations (e.g., sandbox-only CI bots)
- Enforce admin vs. user role separation on provider and config management


### Proposed Design

Add OAuth2/OIDC as a third authentication mode alongside mTLS and Cloudflare Access, with two-layer authorization: role-based access control (RBAC) and opt-in scope-based fine-grained permissions.

**Authentication (server-side):**
- JWT validation against a configurable OIDC issuer's JWKS endpoint (`--oidc-issuer`)
- JWKS key caching with TTL and key rotation handling
- Method classification: unauthenticated (health/reflection), sandbox-secret (supervisor RPCs), dual-auth (UpdateConfig, GetSandboxConfig), Bearer-authenticated (everything else)
- Anti-spoofing: internal auth-source headers stripped from inbound requests

**Authorization — RBAC:**
- Configurable admin/user role names extracted from a configurable JWT claim path (`--oidc-roles-claim`)
- Admin methods: provider CRUD, config mutations, draft policy approvals, inference write
- User methods: sandbox lifecycle, provider read, policy/settings read
- Auth-only mode: empty role names skip RBAC (for providers like GitHub Actions that don't emit roles)

**Authorization — Scopes (opt-in):**
- When `--oidc-scopes-claim` is set, the server enforces per-method scopes on top of roles
- 8 scopes: `sandbox:read`, `sandbox:write`, `provider:read`, `provider:write`, `config:read`, `config:write`, `inference:read`, `inference:write`, plus `openshell:all` wildcard
- Exhaustive scope map — unlisted methods require `openshell:all`
- Scopes cannot escalate past role gates

**CLI:**
- Browser-based Authorization Code + PKCE flow (`gateway login`)
- Client Credentials flow for CI/automation (`OPENSHELL_OIDC_CLIENT_SECRET`)
- Token storage with automatic refresh, logout command
- `--oidc-scopes` flag to request specific scopes during login
- OIDC bearer token injected over mTLS transport for local K3s gateways

**Sandbox supervisor auth:**
- Sandbox-to-server RPCs authenticated via `x-sandbox-secret` header (not OIDC)
- Sandbox-secret callers restricted to sandbox-scoped policy sync on UpdateConfig
- GetInferenceBundle classified as sandbox-secret for credential isolation

**Deployment:**
- Full plumbing through DeployOptions, Docker env vars, Helm values/templates, HelmChart manifest, cluster-entrypoint.sh, and bootstrap scripts
- `OPENSHELL_OIDC_*` env vars for all configuration
- Gateway metadata preserves OIDC config across stop/start cycles

**Keycloak dev environment:**
- `mise run keycloak` starts a pre-configured Keycloak with test realm, users (admin@test, user@test), roles, PKCE client, CI client, and OpenShell client scopes
- Built-in OIDC scopes (openid, profile, email, roles) configured as defaults so role and profile claims are always present

**Provider compatibility:**
- Tested with Keycloak. Configurable roles claim path and role names support Entra ID, Okta, and GitHub Actions.
- Scope enforcement supports space-delimited strings (Keycloak, Entra) and JSON arrays (Okta)


### Alternatives Considered

**mTLS-only with cert-based roles:** Would require a custom CA that encodes roles in certificate OUs. Complex PKI management, no standard IdP integration, no token expiry or refresh.

**API keys / static tokens:** Simple but no identity, no role hierarchy, no integration with enterprise identity providers, no audit trail of who did what.

**Full openidconnect crate:** Evaluated the openidconnect Rust crate for both server and CLI. Recommend the lighter `oauth2` crate for CLI flows (PKCE, token exchange, refresh). Server-side JWT validation stays with `jsonwebtoken` because the openidconnect crate's `IdTokenVerifier` is designed for ID tokens, not access tokens, and the JWKS cache with thundering-herd protection needs to remain custom.


### Agent Investigation


- Explored the existing auth boundary in `multiplex.rs` — the `AuthGrpcRouter` tower middleware was the natural insertion point for JWT validation before request dispatch
- Reviewed the Cloudflare Access auth flow in `auth.rs` and `tls.rs` to understand the existing `EdgeAuthInterceptor` pattern, which was extended for OIDC bearer token injection
- Reviewed the gateway metadata and token storage patterns in `openshell-bootstrap` (`edge_token.rs`, `metadata.rs`) to model OIDC token persistence consistently
- Reviewed the Keycloak realm export format to build the pre-configured dev realm with proper client scope assignments and built-in OIDC scope preservation
- Reviewed the `oauth2` crate API (v5) for type-state safety and `RequestBody` vs `BasicAuth` client authentication semantics


### Checklist

- [x] I've reviewed existing issues and the architecture docs
- [x] This is a design proposal, not a "please build this" request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OIDC/Keycloak authentication with RBAC and scope-based permissions #930

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OIDC/Keycloak authentication with RBAC and scope-based permissions #930

Description

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions