Skip to content

feat(auth): refresh access token proactively before expiry#361

Open
brycelelbach wants to merge 2 commits intobrevdev:mainfrom
brycelelbach:fix/auth-token-proactive-refresh
Open

feat(auth): refresh access token proactively before expiry#361
brycelelbach wants to merge 2 commits intobrevdev:mainfrom
brycelelbach:fix/auth-token-proactive-refresh

Conversation

@brycelelbach
Copy link
Copy Markdown

Stacked on #359

This PR is stacked on #359 (Step 1: the reactive refresh + "auto-login" sentinel cleanup). Because #359 is not merged yet, the diff for this PR includes its two commits too. Once #359 lands, the base of this PR will be rebased onto main and the diff will shrink to just the Step 3 commit.

Relates to #360 (Step 2, server-side).

Problem

After #359, the CLI no longer silently wedges itself with "auto-login" as a refresh token, and it retries on 401. But the refresh is still reactive — it only fires after an API call returns 401. Two consequences:

  1. Every call that races the exp boundary pays a round-trip. The doomed 401 request, the refresh, then the retry. For short-TTL KAS access tokens that's common.
  2. Transient network failures look like hard auth failures. If api.ngc.nvidia.com is momentarily slow or unreachable during the refresh call, the user is pushed through an error path as if their refresh credential had been rejected, even though the still-valid access token in hand would have answered the next call just fine.

Fix

pkg/entity/entity.go

  • AuthTokens gains optional AccessTokenExp *time.Time and IssuedAt *time.Time fields (both omitempty). These are populated from the access JWT's exp and iat claims at save time, so the CLI doesn't have to re-parse the JWT on every call to check expiry.
  • Backward-compatible: credentials files written by older CLI versions simply don't have these fields; the CLI falls back to parsing the JWT live for those files.

pkg/auth/auth.go

  • New const refreshBeforeExpiry = 5 * time.Minute. If the access token is valid and expires within this window, the CLI refreshes before making the next call. Five minutes comfortably exceeds typical request RTTs (so we don't race) without refreshing so aggressively that we burn the IdP's refresh budget.
  • GetFreshAccessTokenOrNil now handles three cases:
    • access token valid and not expiring soon → return it
    • access token invalid → refresh; propagate errors (today's behavior)
    • access token valid but expiring soon → attempt refresh; on error, fall back to the still-valid access token. The user isn't logged out because of a 200ms network blip.
  • getNewTokensWithRefreshOrNil now classifies refresh errors. A small isTransientRefreshError helper uses errors.As to identify url.Error / net.Error timeouts and net.OpError connection failures. Transient failures are wrapped with a plain-language message; authoritative rejections from the IdP produce a single stderr line ("re-run brev login") rather than a stack trace, which is the experience the user who filed the original report was asking for.
  • populateTokenTimestamps is called on both the LoginWithToken save path and the refresh save path, so the new fields are always kept in sync with the current access token.

What this does and does not fix

Fixes: the extra-round-trip cost on near-exp access tokens, and the UX regression where momentary network failures force re-login. With this PR the CLI will only prompt for brev login when the IdP has actually rejected the refresh credential.

Does not fix: the brev login --token path still produces a non-refreshable session, because the /settings/cli webpage hands out a raw JWT rather than a refresh credential. That is the server-side change tracked in #360.

Test plan

  • go build ./...
  • go vet ./pkg/auth/... ./pkg/store/... ./pkg/entity/...
  • go test ./pkg/auth/...
  • Manual: set a very short access-token TTL on a test account, use the CLI for a few minutes, confirm refresh happens ~5 min before expiry with no 401 round-trip.
  • Manual: block outbound to api.ngc.nvidia.com for ~1 minute mid-session; confirm the CLI continues answering calls with the current access token and resumes refresh on the next call, rather than prompting for re-login.
  • Manual: revoke a refresh token server-side; confirm the CLI prints the single-line stderr prompt and exits cleanly.

🤖 Generated with Claude Code

brycelelbach and others added 2 commits April 18, 2026 07:17
Users report being logged out of brev-cli multiple times per hour,
forcing repeated `brev login --token`. Two bugs in the auth flow
caused this:

1. `LoginWithToken` stored the literal string "auto-login" as the
   RefreshToken whenever it was handed a valid JWT access token
   (pkg/auth/auth.go). When that short-lived access token later
   expired, the refresh path in `GetFreshAccessTokenOrNil` tried to
   exchange "auto-login" with the IdP, which always failed, so the
   user was prompted to re-login every time the access token aged
   out — roughly hourly with typical NVIDIA KAS token lifetimes.

2. The auth-failure retry handler in pkg/store/http.go only fired on
   HTTP 403 Forbidden. APIs returning 401 Unauthorized (the more
   standard response for expired credentials) bypassed the refresh
   hook entirely and surfaced as a hard failure on the first such
   response, even when the refresh token was still good.

This change:

- Stores an empty RefreshToken when LoginWithToken receives a JWT,
  so the refresh path correctly recognizes there is nothing to
  refresh and falls through without a failed IdP round-trip. Emits a
  one-line notice on stderr so the user knows this session cannot
  be auto-renewed.
- Normalizes the legacy "auto-login" sentinel to "" on read, so
  users with existing credentials.json files from older CLI versions
  benefit from the fix without re-running `brev login --token`.
- Changes `GetFreshAccessTokenOrNil` to return "" (rather than an
  expired AccessToken) when there is no way to refresh, so callers
  prompt for re-login cleanly instead of shipping an expired JWT to
  the API.
- Extends the refresh-and-retry handler in AuthHTTPStore to fire on
  both 401 and 403.

Longer-term the website at brev.nvidia.com/settings/cli should hand
out a real refresh token (or a short-lived exchange code) rather
than a raw access JWT so `brev login --token` can produce a
refreshable session. This PR is the CLI-only stop-the-bleeding fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous behavior waited until an API call returned 401 to trigger
the refresh path. That keeps one unnecessary round-trip on the critical
path of every call that races the token's exp boundary, and it surfaces
transient auth-provider network failures as hard failures rather than
retries.

This change lets the CLI refresh ahead of expiry:

- `AuthTokens` gains optional `AccessTokenExp` and `IssuedAt` fields
  (omitempty). They are populated from the access JWT's `exp` / `iat`
  claims at save time, and fall back to live JWT parsing for credential
  files written by older CLI versions. No migration needed.

- `GetFreshAccessTokenOrNil` refreshes when the access token is invalid
  OR when it is valid but within `refreshBeforeExpiry` (5 min) of exp.
  The proactive branch is tolerant of refresh failure: if the IdP is
  briefly unreachable, the CLI keeps using the still-valid access token
  and retries on the next call rather than logging the user out.

- `getNewTokensWithRefreshOrNil` classifies refresh errors. Network-
  level failures (timeouts, connection refused, DNS) are wrapped with a
  clearer message; authoritative IdP rejection produces a one-line
  stderr prompt ("re-run `brev login`") rather than a stack-trace dump.

Builds on the reactive 401/403 retry from the prior auth fix.
Copilot AI review requested due to automatic review settings April 18, 2026 07:31
@brycelelbach brycelelbach requested a review from a team as a code owner April 18, 2026 07:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds proactive access-token refresh support to the CLI auth layer, reducing 401/retry round-trips near token expiry and improving resilience to transient refresh failures by persisting JWT timing metadata in saved credentials.

Changes:

  • Persist access-token exp/iat timestamps in AuthTokens (AccessTokenExp, IssuedAt) for faster expiry checks and UX surfacing.
  • Proactively refresh access tokens within a 5-minute pre-expiry window, with special handling to tolerate refresh failures when the current access token is still valid.
  • Treat both HTTP 401 and 403 as auth failures for refresh-and-retry behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
pkg/store/http.go Generalizes refresh-and-retry trigger to include 401 via isAuthFailure.
pkg/entity/entity.go Extends persisted auth token model with optional expiry/issued-at timestamps.
pkg/auth/auth.go Implements proactive refresh window, JWT claim extraction for exp/iat, and refresh error classification behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/store/http.go
Comment on lines +128 to +134
// isAuthFailure reports whether an HTTP status code indicates the caller's
// credentials are missing, invalid, or expired. Both 401 Unauthorized and
// 403 Forbidden can signal an expired access token from Brev's APIs, so we
// treat both as triggers for the refresh-and-retry path.
func isAuthFailure(code int) bool {
return code == http.StatusUnauthorized || code == http.StatusForbidden
}
Comment thread pkg/auth/auth.go
}
return "", breverrors.WrapAndTrace(refreshErr)
}
if newTokens == nil {
Comment thread pkg/auth/auth.go
Comment on lines +248 to +252
// populateTokenTimestamps fills in AccessTokenExp and IssuedAt from the
// access JWT when they are not already set. Safe to call on any AuthTokens
// value; missing or non-JWT access tokens leave the fields nil.
func populateTokenTimestamps(tokens *entity.AuthTokens) {
if tokens == nil || tokens.AccessToken == "" {
Comment thread pkg/auth/auth.go
Comment on lines +174 to +180
// Trigger a refresh when the token is invalid OR when it is still valid
// but close enough to expiry that the next API call is likely to race
// the exp boundary. The proactive branch is tolerant of refresh failure:
// if the IdP is briefly unreachable, fall back to the (still-valid)
// current access token rather than logging the user out.
expiringSoon := isAccessTokenValid && tokens.RefreshToken != "" && accessTokenExpiresSoon(tokens)
if !isAccessTokenValid || expiringSoon {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants