Skip to content

Add pr-context-synthesis skill#4374

Open
AryanGodara wants to merge 4 commits intomainfrom
aryan/pr-context-synthesis-skill
Open

Add pr-context-synthesis skill#4374
AryanGodara wants to merge 4 commits intomainfrom
aryan/pr-context-synthesis-skill

Conversation

@AryanGodara
Copy link
Copy Markdown
Member

@AryanGodara AryanGodara commented May 1, 2026

What this is

A primitive for producing a tight 1–3 paragraph what / why / how synthesis of a single PR (or PR-shaped change). Stays anchored to the actual diff and refuses vague verbs.

Ships with /pr-synthesis <N|owner/repo#N|url>, a slash command that fetches PR metadata, linked issue, and diff and prints the synthesis verbatim.

Why split out

Extracted from #4351 because the synthesis primitive is reusable beyond a single review report — review and incident-investigation flows both want the same per-PR summary shape. Splitting keeps the contract small.

Files

  • docs/skills/pr-context-synthesis.md — rules + output shape.
  • .claude/commands/pr-synthesis.md — slash command wrapping the procedure.

Changes after Jose's review

  • Drop the verbose tail from the slash-command frontmatter; accept the #4267 form alongside 4267.
  • Reorder Inputs in the skill doc so <diff_summary> (the ground truth) is first, with <pr_text> and <linked_issue> interpreted relative to it.
  • Reorder the procedure: fetch PR metadata + closing-issue references, then fetch each linked issue, then the diff. Lets the synthesis read the diff with the author's intent already in mind.
  • Drop the pr-blame-walk reference from the skill body (forward-looking dep that doesn't belong here).
  • Reword the <pr_text> input description to be accurate when there's no PR yet (local-diff mode).
  • Diff-size preflight. gh pr diff on a 376k-line PR will silently corrupt the context window. The procedure now gates the full-diff fetch behind additions + deletions <= 2000. Above that, it falls back to the per-file scope from the paginated REST endpoint and bucket-summarises lockfiles, generated bindings, and codegen output.

While testing this on PR #4217, I caught a separate silent-truncation bug in the procedure I was about to ship: gh pr view --json files caps at 100 files, so a 384-file PR would have silently underreported its scope. The fallback path now uses gh api --paginate repos/<o>/<r>/pulls/<N>/files instead.

Worked examples

Two real PRs, both above the 2k full-diff gate, so both exercise the per-file-bucket fallback. (The original example in the skill doc — PR #4371, ~60 lines — exercises the small-diff branch.)

Example 1 — PR #4243 (4,432 lines, 24 files)

Boundary case: just over the gate. Generated bindings dominate the line count; the substantive change is a 275-line estimator + a 236-line e2e test.

Adds a new Eip4626 native price estimator to the price-estimation stack at crates/price-estimation/src/native/eip4626.rs. The estimator unwraps an EIP-4626 vault token by calling asset() and convertToAssets(10^decimals) on-chain, then delegates pricing of the underlying token to the next estimator in the configured stage. New contract bindings (IERC4626, MockERC4626Wrapper) ship as pre-generated alloy crates; an e2e forked-mainnet test covers a single vault and a recursive vault chain.

Vault tokens like sDAI often lack direct DEX liquidity, which makes the existing native-price estimators fall over. This change lets the orderbook quote vault tokens by reducing them to their underlying. (No linked issue; intent inferred from the PR body and the new test names.)

Mechanically: a Mutex<HashSet<Address>> negative cache short-circuits subsequent requests for tokens whose asset() reverts (cleared on restart); each on-chain RPC is bounded by tokio::time::timeout, with the leftover budget forwarded to the inner estimator so total wall-clock stays inside the caller's timeout — important for nested vaults. A configurable depth parameter (default 1) controls recursion, wired in factory.rs::create_native_estimator. Config deserialisation rejects stages where Eip4626 is the last entry, since it must be followed by another estimator to price the underlying.

Diff summarised at file-scope only — total +4,411 / −21 across 24 files exceeds the 2k full-diff threshold. Generated bindings (ierc4626/src/lib.rs +648, mockerc4626wrapper/src/lib.rs +2,768) and the MockERC4626Wrapper.json artifact (+240) account for ~83% of additions; the substantive Rust changes are the 275-line estimator, 236-line e2e test, and 14 small wiring edits.

Example 2 — PR #4217 (376,829 lines, 384 files)

Stress case: two orders of magnitude over the gate. Demonstrates that the per-file bucket summary is genuinely more useful than the full diff would be — 96% of the lines are mechanical codegen output, and the substantive change hides in a 116-file +213/−1,531 workspace bucket of import-path edits.

Moves the contracts crate from crates/contracts to a standalone contracts/ directory at repo root, and replaces its build.rs runtime code-generation with 148 pre-generated per-contract Alloy binding crates under contracts/generated/contracts-generated/. A new contracts-facade crate re-exports the bindings so consumers see a single import surface. All 116 workspace crates that previously depended on crates/contracts are rewired to import from the new generated crates. Codegen still exists, but is now a separate cargo run -p contracts step rather than an implicit cost on every workspace build.

Driver: build.rs was the longest critical-path step in the workspace build. The PR description's flame graphs claim ~15 s shaved off cold builds (from ~60 s to ~45 s) and visibly higher CPU utilisation since the build no longer serialises behind a single codegen blocker. Pre-generated bindings are also independently version-controllable and reviewable — historically the only way to inspect a binding diff was to rerun the build.

Diff summarised at file-scope only — +375,175 / −1,654 across 384 files is two orders of magnitude over the 2k full-diff threshold. Bucketed: generated bindings 148 files / +292,618 (mechanical, alloy-codegen output); contract artifacts 74 files / +7,278 (JSON ABI moves); workspace crates 116 files / +213 / −1,531 (overwhelmingly import-path edits and removal of the old crates/contracts/); Cargo.lock 3 files / +7,747; contracts tooling 38 files / +1,192. Almost everything except the 116 workspace-crate imports is mechanically generated or moved.

Reviewing this in isolation

The skill body is self-contained. The intro mentions consumers (/review-pr, ad-hoc summaries, future incident walks) but doesn't link to anything that doesn't exist on main. Rules, Shape, and Example stand on their own.

How to try it

/pr-synthesis 4371
/pr-synthesis 4243
/pr-synthesis 4217

@AryanGodara AryanGodara changed the title add pr-context-synthesis skill Add pr-context-synthesis skill May 1, 2026
@AryanGodara AryanGodara self-assigned this May 1, 2026
@AryanGodara AryanGodara changed the title Add pr-context-synthesis skill Add pr-blame-walk skill May 1, 2026
@AryanGodara AryanGodara changed the title Add pr-blame-walk skill Add pr-context-synthesis skill May 5, 2026
@AryanGodara AryanGodara force-pushed the aryan/pr-context-synthesis-skill branch from 97e8570 to 59935ce Compare May 5, 2026 13:45
@AryanGodara AryanGodara marked this pull request as ready for review May 5, 2026 13:48
@AryanGodara AryanGodara requested a review from a team as a code owner May 5, 2026 13:48
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Claude command for PR synthesis and a corresponding skill definition. The changes establish a structured process for generating concise summaries of pull requests by fetching data from GitHub. Feedback highlights a brittle approach to identifying linked issues, suggesting the use of GitHub's native API fields instead. Additionally, a contradiction was identified between the command's behavior and the skill's rules regarding how to handle drift between the PR description and the actual diff.

Comment thread .claude/commands/pr-synthesis.md Outdated
Comment on lines +36 to +40
Parse the PR body for `Fixes #N` / `Closes #N` / `Resolves #N` (case-insensitive) and fetch each linked issue:

```bash
gh issue view <N> -R <owner>/<repo> --json title,body,labels,state
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Manually parsing the PR body for linked issues using keywords like Fixes #N is brittle. A more robust approach is to leverage GitHub's built-in issue linking by using the closingIssuesReferences field from the gh pr view API. This ensures you are using the same mechanism that GitHub uses to link PRs and issues.

To implement this, you'll need to add closingIssuesReferences to the --json fields in the gh pr view call on line 32. Then, you can iterate over the structured data to get issue details.

Suggested change
Parse the PR body for `Fixes #N` / `Closes #N` / `Resolves #N` (case-insensitive) and fetch each linked issue:
```bash
gh issue view <N> -R <owner>/<repo> --json title,body,labels,state
```
gh issue view <issue.number> -R <issue.repository.owner>/<issue.repository.name> --json title,body,labels,state

Comment thread docs/skills/pr-context-synthesis.md Outdated
## Rules

1. **Synthesize, don't copy-paste.** If `<pr_text>` is five words, say so plainly: *"description is minimal; intent inferred from diff"*. Don't pad to look thorough.
2. **Watch for description-vs-diff drift.** `<pr_text>` must describe `<diff_summary>`'s *current* state, not the author's iteration history. If a claim is no longer true of the diff, raise a finding with `Action: update the PR description to match the current diff`. Do **not** flag the absence of a changelog of removed/superseded behaviour — that belongs in commit history, not the description.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a contradiction between this rule and the implementation described in .claude/commands/pr-synthesis.md.

This file says to raise a finding with Action: update the PR description....
However, .claude/commands/pr-synthesis.md (line 57) states that in case of a drift, the skill should note it in the synthesis as "description claims X; diff shows Y".

The latter seems more appropriate for a synthesis skill, which should report facts rather than take actions. The consumer of the synthesis can then decide whether to raise a finding. I recommend aligning this rule with the implementation.

Suggested change
2. **Watch for description-vs-diff drift.** `<pr_text>` must describe `<diff_summary>`'s *current* state, not the author's iteration history. If a claim is no longer true of the diff, raise a finding with `Action: update the PR description to match the current diff`. Do **not** flag the absence of a changelog of removed/superseded behaviour — that belongs in commit history, not the description.
2. Watch for description-vs-diff drift. <pr_text> must describe <diff_summary>'s current state, not the author's iteration history. If a claim is no longer true of the diff, note it in the synthesis (e.g., "description claims X; diff shows Y"). Do not flag the absence of a changelog of removed/superseded behaviour — that belongs in commit history, not the description.

@AryanGodara AryanGodara force-pushed the aryan/pr-context-synthesis-skill branch from 87b5610 to 8af2427 Compare May 5, 2026 14:27
Copy link
Copy Markdown
Contributor

@jmg-duarte jmg-duarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share examples in the description?

I also suggest you try these:

Comment thread .claude/commands/pr-synthesis.md Outdated
@@ -0,0 +1,57 @@
---
description: Produce a tight 1–3 paragraph what / why / how synthesis of a single PR. Fetches title, body, linked issue, and diff scope; outputs per `docs/skills/pr-context-synthesis.md`. Useful as a standalone summary or as a building block for review / incident-investigation flows. Read-only.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this bit is necessary

Suggested change
description: Produce a tight 1–3 paragraph what / why / how synthesis of a single PR. Fetches title, body, linked issue, and diff scope; outputs per `docs/skills/pr-context-synthesis.md`. Useful as a standalone summary or as a building block for review / incident-investigation flows. Read-only.
description: Produce a tight 1–3 paragraph what / why / how synthesis of a single PR. Fetches title, body, linked issue, and diff scope; outputs per `docs/skills/pr-context-synthesis.md`. Read-only.

Comment thread .claude/commands/pr-synthesis.md Outdated

Accept any of:

- A PR number: `4267`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- A PR number: `4267`
- A PR number: `4267`/`#4267`

Comment thread .claude/commands/pr-synthesis.md Outdated

```bash
gh pr view <N> -R <owner>/<repo> --json title,body,files,labels,baseRefName,headRefName,closingIssuesReferences
gh pr diff <N> -R <owner>/<repo>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff can potentially be huge and break the context window.
This should first get a summary of the diff size and then decide whether to chunk it or not.

Comment thread .claude/commands/pr-synthesis.md Outdated
Comment on lines +36 to +42
For each entry in `closingIssuesReferences` (GitHub's own parsing of `Fixes #N` / `Closes #N` / `Resolves #N`, more reliable than regexing the body), fetch the issue:

```bash
gh issue view <issue.number> -R <issue.repository.owner>/<issue.repository.name> --json title,body,labels,state
```

If `closingIssuesReferences` is empty, proceed without a linked issue — never invent one.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should honestly be done even before the diff, it should help setup context for the diff and also it would allow Claude to high mismatches

Comment thread docs/skills/pr-context-synthesis.md Outdated

Two ways:

- **Procedurally** — follow the rules below when you need a tight 1–3 paragraph synthesis (CONTEXT block of `/review-pr`, per-candidate block in `pr-blame-walk`, ad-hoc *"summarise this PR for me"*).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean that this PR is blocked by the one for pr-blame-walk?

Comment thread docs/skills/pr-context-synthesis.md Outdated

## Inputs

- `<pr_text>` — PR title and body. If there is no PR yet (e.g. local-diff mode), substitute the branch name plus the relevant commit messages.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit more accurate (?)

Suggested change
- `<pr_text>` — PR title and body. If there is no PR yet (e.g. local-diff mode), substitute the branch name plus the relevant commit messages.
- `<pr_text>` — PR title and body. If there is no PR yet (e.g. local-diff mode), use the current branch name and the relevant commit messages (i.e. diff against `main`).

Comment thread docs/skills/pr-context-synthesis.md Outdated

- `<pr_text>` — PR title and body. If there is no PR yet (e.g. local-diff mode), substitute the branch name plus the relevant commit messages.
- `<linked_issue>` — title and body of any issue referenced via `Fixes #N` / `Closes #N` / `Resolves #N`. May be empty.
- `<diff_summary>` — file scope plus a codemap or per-file note of the actual change. The ground truth the synthesis must stay anchored to.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the ground truth, should it be first in the list so the follow up items are interpreted with this in context?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants