Skip to content

unifyTypes: pointer-eq fast-path (-7.5% full build)#17

Open
kozak wants to merge 2 commits into
type-hashfrom
ptr-eq-unify-ship
Open

unifyTypes: pointer-eq fast-path (-7.5% full build)#17
kozak wants to merge 2 commits into
type-hashfrom
ptr-eq-unify-ship

Conversation

@kozak
Copy link
Copy Markdown

@kozak kozak commented Apr 28, 2026

Summary

One-line change in unifyTypes: when both arguments refer to the same heap object, return immediately — bypassing substituteType, the hint-stack push, and the cache lookup. Sound regardless of the result; the fall-through path is unchanged.

unifyTypes t1 t2
  | isTrue# (reallyUnsafePtrEquality# t1 t2) = pure ()
  | otherwise = do
  sub <- gets checkSubstitution
  withErrorMessageHint (ErrorUnifyingTypes t1 t2) $ unifyTypes'' (substituteType sub t1) (substituteType sub t2)
  ...

Why

The unify-callsite-survey experiment found 75% of unification-cache hits at 3 sites in Types.hs are on the trivial pair (tyFunction, tyFunction). Even on a cache hit, every call pays the wrapper overhead: gets checkSubstitution, substituteType walking closed types, withErrorMessageHint pushing/popping the hint stack, and the HashSet lookup. For tyFunction propagated from Environment (a single shared constant), the two arguments are usually the same heap object — one reallyUnsafePtrEquality# short-circuits all of that.

Results vs 5713e832 baseline (type-hash + lint fix)

Scenario Δ
full -7.5% (53.1s → 49.1s)
nochange -3.6%
prelude +4.8%
leaf +3.4%

Tests pass (1340/1340). Binary +4 KB (no inlining shift).

The trade-off

This is a partial win. The full build wins clearly. Cascade rebuild scenarios (prelude touches Prelude → 1342 deps, leaf touches a single module) regress modestly. We tried adding a phase 2 that kept the substituteType/hint-stack short-circuit but still inserted into the cache; the prelude regression was unchanged (+4.7%), so the cause isn't missed cache writes.

A cost-centre profile of the prelude scenario shows unifyTypes/substituteType/withErrorMessageHint all at 0.0% inherited time despite ~188k entries — the regression is below cost-centre granularity, likely a code-gen / branch-prediction / icache effect of the kind LESSONS warns about for any Unify.hs modification.

The full-build win is significantly larger than the rebuild-scenario regressions, so net across realistic workload mixes is positive. Full experiment writeup: experiments/ptr-eq-unify/.

Soundness

reallyUnsafePtrEquality# returning True implies the two arguments share a heap object → structurally equal → unifyTypes is a no-op (no fresh TUnknowns to solve, no errors). Returning False is allowed even on structurally equal inputs; the existing path handles that case unchanged.

Stacked on

This builds on type-hash (which adds TypeFlags + HashSet-keyed unification cache; PR pending). The base for this PR is type-hash. The change is a few lines that should rebase cleanly onto master once type-hash lands.

Test plan

  • stack build — clean compile
  • stack test --fast — 1340/1340 pass
  • All four scenarios benchmarked on pr-admin (median of 4, warm-up discarded)
  • Binary size delta verified (+4 KB — no inlining shift)

🤖 Generated with Claude Code

kozak and others added 2 commits April 28, 2026 07:50
- Unify.hs: when (not ...) -> unless
- Types.hs: rename ctorSalt_X -> ctorSaltX (camelCase)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When t1 and t2 refer to the same heap object, unifyTypes is a no-op
(structurally equal => no unknowns to solve, no errors). One machine
compare via reallyUnsafePtrEquality# bypasses substituteType (which
walks closed types unnecessarily), the hint-stack push, and the cache
lookup. Sound regardless of result: ptr-eq False just falls through to
the existing path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zyla
Copy link
Copy Markdown
Collaborator

zyla commented Apr 29, 2026

Cascade rebuild scenarios (prelude touches Prelude → 1342 deps, leaf touches a single module) regress modestly

Doesn't make sense - the added operation is so cheap it shouldn't ever cause a regression. So this is probably just measurement noise.

And this makes me wonder: is the improvement also just noise? Maybe we need a more rigorous measurement protocol?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants