Add MCP and skill ROI optimize insights by ozymandiashh · Pull Request #354 · getagentseal/codeburn

ozymandiashh · 2026-05-18T22:12:57Z

Summary

Adds two new optimize findings for capability-level MCP/skill analysis:
- low edit ROI for invoked MCP servers and skills in implementation-like turns
- retry impact for capabilities whose edit turns retry materially more than same-category baseline turns
Keeps the findings diagnostic: CodeBurn asks users to inspect sessions and narrow/remove capabilities only after confirming the sessions justify it.
Caps shared-turn token/cost/retry impact once when multiple candidate capabilities appear in the same turn, so co-occurring MCP + skill usage cannot inflate ranking.

Why

optimize already reports unused MCP inventory and ghost skills, but it could not answer two higher-level questions:

"This MCP server/skill is being used, but is it helping implementation work?"
"When this capability is involved, does the agent need more retry loops than comparable work?"

Those are different from ghost/unused checks: a capability can be invoked frequently and still be low-signal for edit-producing work, or correlate with repeated fix/test loops. The new findings surface that as a review signal without pretending correlation is causation.

What changed

Adds turn-keyed capability aggregation over parsed session summaries:
- MCP capabilities are grouped by server from canonical mcp__<server>__<tool> names.
- Skill capabilities come from parsed call.skills only; generic turn labels such as subCategory are not treated as skills.
- Capabilities are counted once per turn even if the same MCP server/tool appears multiple times.
Adds a low-edit-ROI detector for implementation-like categories (coding, debugging, feature, refactoring, testing).
- Example output shape:
  - MCP docs: 1/4 implementation turns produced edits (25% edit rate), $2.00 touched
  - skill api-review: 0/3 implementation turns produced edits (0% edit rate), $1.50 touched
Adds a retry-impact detector using same-category edit baselines.
- Example output shape:
  - skill planner: 2.0 retries/edit turn vs 0.0 baseline in the same task categories (3 edit turns, baseline 3)
Reuses a single capability aggregation pass inside scanAndDetect, alongside existing MCP coverage aggregation.
Adds regression tests for:
- ROI sample thresholds
- MCP/skill findings together
- per-turn MCP deduplication
- shared-turn savings caps for co-occurring MCP + skill candidates
- no subCategory false skill labels
- same-category retry baselines
- retry sample thresholds

Validation

I validated the behavior by running the real src/optimize.ts detector exports against controlled ProjectSummary fixtures via npx tsx --eval. This is not just green tests: the harness constructs the exact edge cases this PR is meant to handle and prints the detector output.

{
  "roi_shared_turn_cap": {
    "title": "2 MCP/skill capabilities with low edit ROI",
    "tokensSaved": 1650,
    "expectedTokensSaved": 1650,
    "containsMcpCombo": true,
    "containsSkillCombo": true,
    "proof": "3 shared non-edit turns * 2,200 effective tokens * 25%; not doubled to 3,300 when MCP and skill co-occur"
  },
  "retry_same_category_baseline_and_shared_turn_cap": {
    "title": "2 MCP/skill capabilities correlated with high retries",
    "tokensSaved": 3300,
    "expectedTokensSaved": 3300,
    "containsSameCategoryBaseline": true,
    "ignoresHighRetryDebuggingBaseline": true,
    "proof": "3 shared coding edit turns * 2,200 effective tokens * 50%; debugging retry turns do not contaminate coding baseline"
  },
  "subCategory_false_skill_guard": {
    "finding": null,
    "proof": "turn.subCategory=frontend with no call.skills and no MCP tools emits no capability finding"
  }
}

What this proves:

ROI emits both the MCP and skill candidates, but shared non-edit turns are capped once: 1,650 tokens instead of the buggy 3,300 double count.
Retry impact emits both the MCP and skill candidates, but shared edit turns are capped once: 3,300 tokens instead of the buggy 6,600 double count.
Retry baseline is same-category: high-retry debugging edit turns are present in the fixture, but they do not contaminate the coding baseline.
subCategory alone does not create a fake skill frontend finding.

Supporting checks:

./node_modules/.bin/tsc --noEmit --pretty false
npx vitest run tests/optimize.test.ts — 87 tests passed
npm run build
npm test -- --run — 62 files / 882 tests passed
git diff --check origin/main...HEAD
GitHub checks on this PR: assess, check, semgrep all passed
Claude Opus 4.7, effort max review:
- first pass found shared-turn double-counting and subCategory skill-label risks
- after fixing both, final review returned PASS
Gemini 3.1 Pro Preview review returned PASS on the amended diff

Notes

This intentionally does not auto-edit MCP or skill config. The findings are audit prompts because high retries or low edit rate can also reflect hard tasks, read-only investigation, or intentionally exploratory work.
Low MCP tool coverage still owns the "large server inventory is mostly unused" case; the ROI detector suppresses MCP servers already flagged there to avoid duplicate findings.

Add MCP and skill ROI optimize insights

b583aa8

ozymandiashh marked this pull request as ready for review May 18, 2026 22:27

This was referenced May 18, 2026

Add MCP project profile advisor #356

Open

Add MCP and skill reliability report #357

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MCP and skill ROI optimize insights#354

Add MCP and skill ROI optimize insights#354
ozymandiashh wants to merge 1 commit into
getagentseal:mainfrom
ozymandiashh:codex/mcp-skill-roi

ozymandiashh commented May 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ozymandiashh commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What changed

Validation

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ozymandiashh commented May 18, 2026 •

edited

Loading