feat(auth): refresh access token proactively before expiry#361
Open
brycelelbach wants to merge 2 commits intobrevdev:mainfrom
Open
feat(auth): refresh access token proactively before expiry#361brycelelbach wants to merge 2 commits intobrevdev:mainfrom
brycelelbach wants to merge 2 commits intobrevdev:mainfrom
Conversation
Users report being logged out of brev-cli multiple times per hour, forcing repeated `brev login --token`. Two bugs in the auth flow caused this: 1. `LoginWithToken` stored the literal string "auto-login" as the RefreshToken whenever it was handed a valid JWT access token (pkg/auth/auth.go). When that short-lived access token later expired, the refresh path in `GetFreshAccessTokenOrNil` tried to exchange "auto-login" with the IdP, which always failed, so the user was prompted to re-login every time the access token aged out — roughly hourly with typical NVIDIA KAS token lifetimes. 2. The auth-failure retry handler in pkg/store/http.go only fired on HTTP 403 Forbidden. APIs returning 401 Unauthorized (the more standard response for expired credentials) bypassed the refresh hook entirely and surfaced as a hard failure on the first such response, even when the refresh token was still good. This change: - Stores an empty RefreshToken when LoginWithToken receives a JWT, so the refresh path correctly recognizes there is nothing to refresh and falls through without a failed IdP round-trip. Emits a one-line notice on stderr so the user knows this session cannot be auto-renewed. - Normalizes the legacy "auto-login" sentinel to "" on read, so users with existing credentials.json files from older CLI versions benefit from the fix without re-running `brev login --token`. - Changes `GetFreshAccessTokenOrNil` to return "" (rather than an expired AccessToken) when there is no way to refresh, so callers prompt for re-login cleanly instead of shipping an expired JWT to the API. - Extends the refresh-and-retry handler in AuthHTTPStore to fire on both 401 and 403. Longer-term the website at brev.nvidia.com/settings/cli should hand out a real refresh token (or a short-lived exchange code) rather than a raw access JWT so `brev login --token` can produce a refreshable session. This PR is the CLI-only stop-the-bleeding fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous behavior waited until an API call returned 401 to trigger
the refresh path. That keeps one unnecessary round-trip on the critical
path of every call that races the token's exp boundary, and it surfaces
transient auth-provider network failures as hard failures rather than
retries.
This change lets the CLI refresh ahead of expiry:
- `AuthTokens` gains optional `AccessTokenExp` and `IssuedAt` fields
(omitempty). They are populated from the access JWT's `exp` / `iat`
claims at save time, and fall back to live JWT parsing for credential
files written by older CLI versions. No migration needed.
- `GetFreshAccessTokenOrNil` refreshes when the access token is invalid
OR when it is valid but within `refreshBeforeExpiry` (5 min) of exp.
The proactive branch is tolerant of refresh failure: if the IdP is
briefly unreachable, the CLI keeps using the still-valid access token
and retries on the next call rather than logging the user out.
- `getNewTokensWithRefreshOrNil` classifies refresh errors. Network-
level failures (timeouts, connection refused, DNS) are wrapped with a
clearer message; authoritative IdP rejection produces a one-line
stderr prompt ("re-run `brev login`") rather than a stack-trace dump.
Builds on the reactive 401/403 retry from the prior auth fix.
There was a problem hiding this comment.
Pull request overview
Adds proactive access-token refresh support to the CLI auth layer, reducing 401/retry round-trips near token expiry and improving resilience to transient refresh failures by persisting JWT timing metadata in saved credentials.
Changes:
- Persist access-token
exp/iattimestamps inAuthTokens(AccessTokenExp,IssuedAt) for faster expiry checks and UX surfacing. - Proactively refresh access tokens within a 5-minute pre-expiry window, with special handling to tolerate refresh failures when the current access token is still valid.
- Treat both HTTP 401 and 403 as auth failures for refresh-and-retry behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| pkg/store/http.go | Generalizes refresh-and-retry trigger to include 401 via isAuthFailure. |
| pkg/entity/entity.go | Extends persisted auth token model with optional expiry/issued-at timestamps. |
| pkg/auth/auth.go | Implements proactive refresh window, JWT claim extraction for exp/iat, and refresh error classification behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+128
to
+134
| // isAuthFailure reports whether an HTTP status code indicates the caller's | ||
| // credentials are missing, invalid, or expired. Both 401 Unauthorized and | ||
| // 403 Forbidden can signal an expired access token from Brev's APIs, so we | ||
| // treat both as triggers for the refresh-and-retry path. | ||
| func isAuthFailure(code int) bool { | ||
| return code == http.StatusUnauthorized || code == http.StatusForbidden | ||
| } |
| } | ||
| return "", breverrors.WrapAndTrace(refreshErr) | ||
| } | ||
| if newTokens == nil { |
Comment on lines
+248
to
+252
| // populateTokenTimestamps fills in AccessTokenExp and IssuedAt from the | ||
| // access JWT when they are not already set. Safe to call on any AuthTokens | ||
| // value; missing or non-JWT access tokens leave the fields nil. | ||
| func populateTokenTimestamps(tokens *entity.AuthTokens) { | ||
| if tokens == nil || tokens.AccessToken == "" { |
Comment on lines
+174
to
+180
| // Trigger a refresh when the token is invalid OR when it is still valid | ||
| // but close enough to expiry that the next API call is likely to race | ||
| // the exp boundary. The proactive branch is tolerant of refresh failure: | ||
| // if the IdP is briefly unreachable, fall back to the (still-valid) | ||
| // current access token rather than logging the user out. | ||
| expiringSoon := isAccessTokenValid && tokens.RefreshToken != "" && accessTokenExpiresSoon(tokens) | ||
| if !isAccessTokenValid || expiringSoon { |
This was referenced Apr 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #359
This PR is stacked on #359 (Step 1: the reactive refresh +
"auto-login"sentinel cleanup). Because #359 is not merged yet, the diff for this PR includes its two commits too. Once #359 lands, the base of this PR will be rebased ontomainand the diff will shrink to just the Step 3 commit.Relates to #360 (Step 2, server-side).
Problem
After #359, the CLI no longer silently wedges itself with
"auto-login"as a refresh token, and it retries on 401. But the refresh is still reactive — it only fires after an API call returns 401. Two consequences:expboundary pays a round-trip. The doomed 401 request, the refresh, then the retry. For short-TTL KAS access tokens that's common.api.ngc.nvidia.comis momentarily slow or unreachable during the refresh call, the user is pushed through an error path as if their refresh credential had been rejected, even though the still-valid access token in hand would have answered the next call just fine.Fix
pkg/entity/entity.goAuthTokensgains optionalAccessTokenExp *time.TimeandIssuedAt *time.Timefields (bothomitempty). These are populated from the access JWT'sexpandiatclaims at save time, so the CLI doesn't have to re-parse the JWT on every call to check expiry.pkg/auth/auth.gorefreshBeforeExpiry = 5 * time.Minute. If the access token is valid and expires within this window, the CLI refreshes before making the next call. Five minutes comfortably exceeds typical request RTTs (so we don't race) without refreshing so aggressively that we burn the IdP's refresh budget.GetFreshAccessTokenOrNilnow handles three cases:getNewTokensWithRefreshOrNilnow classifies refresh errors. A smallisTransientRefreshErrorhelper useserrors.Asto identifyurl.Error/net.Errortimeouts andnet.OpErrorconnection failures. Transient failures are wrapped with a plain-language message; authoritative rejections from the IdP produce a single stderr line ("re-runbrev login") rather than a stack trace, which is the experience the user who filed the original report was asking for.populateTokenTimestampsis called on both theLoginWithTokensave path and the refresh save path, so the new fields are always kept in sync with the current access token.What this does and does not fix
Fixes: the extra-round-trip cost on near-exp access tokens, and the UX regression where momentary network failures force re-login. With this PR the CLI will only prompt for
brev loginwhen the IdP has actually rejected the refresh credential.Does not fix: the
brev login --tokenpath still produces a non-refreshable session, because the/settings/cliwebpage hands out a raw JWT rather than a refresh credential. That is the server-side change tracked in #360.Test plan
go build ./...go vet ./pkg/auth/... ./pkg/store/... ./pkg/entity/...go test ./pkg/auth/...api.ngc.nvidia.comfor ~1 minute mid-session; confirm the CLI continues answering calls with the current access token and resumes refresh on the next call, rather than prompting for re-login.🤖 Generated with Claude Code