Fix Gemini and Vibe one-shot rates#352
Closed
ozymandiashh wants to merge 1 commit into
Closed
Conversation
This was referenced May 18, 2026
7 tasks
Member
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses #351 by fixing how non-Claude provider calls are cached for one-shot/retry classification.
The core issue was not cache hits. CodeBurn's one-shot rate is based on edit turns with zero detected retries/self-corrections. For Gemini and Mistral Vibe, the cached turn shape did not give the classifier enough structure to see retries inside a single user request.
This PR adds provider-level turn grouping so related assistant calls are cached under the same user turn. That lets the existing classifier see multi-message flows like
Edit -> Bash -> Editand count them as retries instead of reporting them as independent one-shot turns.Closes #351.
Root Cause
The classifier already knows how to detect retries when a turn contains multiple assistant calls:
That should be one edit turn with one retry, because the assistant edited, ran a command, then edited again.
Before this PR, provider calls emitted through
ParsedProviderCallwere converted into cached turns one call at a time. So Gemini assistant messages were cached like this:The classifier never saw the full
Edit -> Bash -> Editsequence in one turn, so the one-shot rate could look artificially perfect.Example: Gemini
Gemini already exposes per-assistant-message token/tool data, so no private user logs are needed to validate the fix.
Synthetic fixture shape:
[ { "type": "user", "content": "implement parser update in src/parser.ts" }, { "type": "gemini", "id": "g1", "toolCalls": [{ "name": "edit_file" }] }, { "type": "gemini", "id": "g2", "toolCalls": [{ "name": "run_command", "args": { "command": "npm test" } }] }, { "type": "gemini", "id": "g3", "toolCalls": [{ "name": "edit_file" }] } ]Before:
After:
The new regression test asserts exactly that.
Example: Mistral Vibe
Mistral Vibe's local logs are different from Gemini:
meta.json.statshas cumulative session totals such assession_prompt_tokens,session_completion_tokens, andsession_cost.messages.jsonlhas user/assistant/tool message structure and assistanttool_calls.So this PR does not invent cache token counts. CodeBurn still reports Vibe cache token counts as
0until Vibe persists those fields locally.What this PR does instead:
turnId.meta.json.stats.session_costwhen present, because that is the best local cost signal available and may already reflect Vibe-side accounting better than CodeBurn's price-derived estimate.session_costis not lost during cached reads.Synthetic Vibe fixture shape:
After parsing:
What Changed
ParsedProviderCall.turnIdso providers can mark calls that belong to the same user turn.(sessionId, turnId)whenturnIdis present.meta.json.stats.session_costfor Mistral Vibe cost when present.costUSDfor Vibe calls so provider-calculated Vibe cost survives cache round-trips.session_costbehavior.Validation
Behavior proof
The main proof is
tests/provider-turn-grouping.test.ts, which exercises the exact suspicious one-shot shape rather than only checking that the suite is green.Gemini fixture:
This proves the classifier can now see the full
Edit -> Bash -> Editsequence for Gemini instead of treating each assistant message as a separate one-shot candidate.Mistral Vibe fixture:
This proves both sides of the Vibe fix: retry detection sees the multi-message edit flow, and Vibe's own
session_costsurvives parser/cache round-trip instead of being replaced by a price-derived estimate.tests/providers/mistral-vibe.test.tsalso includes a direct cost regression where the fixture sets an intentionally large token price butsession_cost = 0.381681; the provider returns exactly0.381681, provingsession_costtakes precedence over estimated pricing.Command results
./node_modules/.bin/tsc --noEmit --pretty falsenpx vitest run tests/provider-turn-grouping.test.ts tests/providers/gemini.test.ts tests/providers/mistral-vibe.test.ts tests/parser-gemini-cache.test.ts tests/session-cache.test.ts- 68 tests passednpm run buildnpm test -- --run- 63 files / 874 tests passedgit diff --checkcheck,assess, andsemgreppassNotes
Validation uses synthetic fixtures only. No private prompts, project names, session IDs, or local user logs were used.
This PR does not claim exact Vibe cache token accounting. Current Vibe local logs do not include cache token fields, so exact cache read/write token reporting requires an upstream Vibe logging change.