type-hash: HashSet for unificationCache (-15.4% full build on pr-admin) by kozak · Pull Request #16 · restaumatic/purescript

kozak · 2026-04-25T13:38:34Z

Summary

Step 1: Cache a structural Int hash on every Type node alongside the existing TypeFlags, computed at construction by combining children's cached hashes with a per-constructor salt. Adds Hashable (Type a) (O(1) via the cached field) and Hashable (Constraint a).
Step 3: Switch the unificationCache in Unify.hs:121–123 from Set (Type, Type) to HashSet (Type, Type). This drops compareType from the unification cache hot path entirely.
Measured -15.4% on full builds of pr-admin (median of 4, 47515-49155 ms; tight). Neutral on the three incremental scenarios.
Ord Type is deliberately unchanged — see the rust-interning Phase 2 trap in LESSONS.md.

compareType was the new top hotspot at 7.7% post-merges (after synonym-opt + skip-redundant-entailment-unify shipped). The unification cache is the largest single source.

Final results (vs baseline `799e8208`)

Scenario	Base (s)	Head (s)	Δ	Notes
full	57.0	48.2	-15.4%	tight, 47515-49155 ms
nochange	0.6	0.6	-1.9%	tight, 564-594 ms
prelude	4.0	4.0	-0.3%	within noise
leaf	1.6	1.6	-1.1%	very noisy (1550-8576 ms)

Two non-obvious implementation details

Both look like minor codegen hints but each is a 2× swing on full builds when missed. Documented in experiments/LESSONS.md:

{-# UNPACK #-} on the !TypeFlags field of every Type constructor. TypeFlags went from newtype (zero-cost wrapper) to a 2-field record; without UNPACK every Type allocation indirects through an extra boxed object — measured +107% on full. With UNPACK, the regression collapses to ~+1.8%.
{-# INLINE #-} on the Hashable Type instance methods. Without it GHC dispatches through the class dictionary at every hash call, allocates thunks, and the supposed HashSet win measures worse than Set (+102% on full). With INLINE the dictionary collapses through to a direct field read and the headline win drops out.

A eqType hash short-circuit (step 2) was tried and reverted — the hot callers (Entailment.hs:295 guard) compare equal types most of the time, so the hash check just adds work. See LESSONS.md.

Test plan

stack test --fast: 1340 examples, 0 failures
All four scenarios on pr-admin: full / nochange / prelude / leaf
No Ord Type changes — iteration order preserved (rust-interning Phase 2 trap avoided)
Reviewer eyeballs the new combineNodeFlags-style helpers and per-constructor salts in Types.hs

🤖 Generated with Claude Code

Adds an Int hash field to TypeFlags alongside the existing Word8 bits, computed at construction by combining children's cached hashes with a constructor-tag salt. Pattern synonym builders use new helpers (binaryNodeFlags, leafFlags1, etc.); each constructor has a distinct salt so e.g. TypeVar "x" and TypeLevelString "x" hash differently. Hashable instances added for the leaf payload types (PSString, Label, Names, WildcardData, ...). Hashable (Type a) and Hashable (Constraint a) expose the cached hash with O(1) cost. UNPACK pragma on the strict TypeFlags field of every Type constructor (and on its inner Word8/Int) is essential — without it, Type allocations go through an extra boxed object and the build time more than doubles. With UNPACK + INLINE on the helpers, all four scenarios on pr-admin are neutral or near-neutral vs baseline 799e820: full +1.8% (58231-58906 ms; tight) nochange -2.3% (552-596 ms; tight) prelude +0.3% (3956-5477 ms; ~noise) leaf +3.2% (1612-8728 ms; very noisy) No behaviour change yet — eqType still structural, no HashSet conversion. The hash field is the foundation for steps 2 & 3. Ord Type is deliberately untouched (hash-based ordering would change observable iteration order — see LESSONS.md rust-interning Phase 2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Step 3 caps off type-hash by switching unificationCache from Set (Type, Type) to HashSet (Type, Type), using the Hashable Type instance step 1 enabled. This drops compareType from the unification cache hot path: HS.member/HS.insert run in O(1) hash + O(1) eqType (collision-free) vs the old O(log n) compareType. The {-# INLINE #-} pragmas on the Hashable Type instance methods are load-bearing — without them GHC dispatches through the class dictionary at every hash call and HashSet measures *worse* than Set (+102% on full). Once specialized, the win is clear. Also picked up in this commit (originally from step 2 work): - typeHash and typeBits direct accessors that pattern-match Type and return the unboxed Int/Word8 directly, avoiding the TypeFlags box the UNPACK pragma forces at every typeFlags call site. - {-# INLINE #-} on typeFlags itself. Final results on pr-admin (median of 4, vs baseline 799e820): full -15.4% (47515-49155 ms; tight) nochange -1.9% (564-594 ms; tight) prelude -0.3% (~noise) leaf -1.1% (very noisy, within range) Step 2 (eqType hash short-circuit) was tried and reverted: most hot eqType callers (e.g. the Entailment.hs:295 guard) compare equal types, so the hash check just adds work. The bigger win was always going to be eliminating the Ord-Type code path entirely via HashSet. Ord Type is deliberately unchanged — see EXPERIMENT.md scope: a hash- based ordering would change observable iteration order and risk the rust-interning Phase 2 trap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tighten the Types.hs API so the per-node hash invariant (constructed via pattern synonyms, mutated only via flag-preserving helpers) is enforced by the module boundary instead of by convention. - Add `modifyFlags :: (TypeFlags -> TypeFlags) -> Type a -> Type a` for callers that need to set a processing flag bit without recomputing the cached hash. Marked INLINE so case-of-known-constructor fuses pattern-synonym construction + flag mutation into a single allocation. - Refactor `replaceAllTypeSynonyms'` in Synonyms.hs to use pattern synonyms for matching/construction and `modifyFlags` for the synonym-free marker. No longer references the underscore-suffixed data constructors directly. - Drop the `module Language.PureScript.Types` self-export and switch to an explicit list. The underscore-suffixed constructors (`TUnknown_`, `TypeVar_`, …) are no longer exported, so it's structurally impossible for downstream code to construct a Type with a wrong cached hash. - Simplify the node-flag helpers (rconsNodeFlags, binaryNodeFlags, ternaryNodeFlags, unaryNodeFlags, constraintNodeFlags) to use uniform `hashWithSalt` chains. `mixHash` removed. - Drop now-dead helpers (`noFlags`, `combineFlags`, `maskStructural`, `typeBits`, `leafFlags2`). stack test --fast: 1340 examples, 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

zyla · 2026-04-27T18:13:04Z

without UNPACK every Type allocation indirects through an extra boxed object — measured +107% on full.

Wow. This suggests there are probably more gains to be had by removing other indirections from Type. Or optimizing the layout in some other way. Pattern synonyms let us play some tricks.

For example (not suggesting we do that, it's crazy, but cool): in Zig they looked which type constructor patterns are most frequent and promoted them to top-level constructors - kinda like Huffman compression but for trees.

https://youtu.be/IroPQ150F6c?si=04SHZMYFiye-B7je&t=1567 - "encoding" approach. https://youtu.be/IroPQ150F6c?si=zufBypPxj1mwR1jy&t=2200 (same talk but later) - applied to Zig compiler AST

zyla · 2026-04-27T18:16:20Z

btw, now I wonder: do we still need the unification cache with skip-redundant-entailment-unify ? With it, unification might be close to eqType in performance for common cases (note: I haven't measured)

kozak · 2026-04-28T07:37:42Z

Claude checked something similar to what you are mentioning https://github.com/restaumatic/purescript/blob/restaumatic/experiments/LESSONS.md#the-unification-cache-is-a-hash-equal-memoizer-unify-pattern-survey . TLDR: it seems we do.

unify-cache Phase 2 (43f6b613, post-type-hash HashSet cache) — one-line drop (unifyTypes'' = unifyTypes'):
  - Full build: 51.8s → 64.3s (+24%)
  - Hit rate 39.3% (412k / 1.05M calls) — the cache saves ~12s for the ~10s of Hashable overhead it adds. Net positive ~12s.

  unify-pattern-survey Phase 2 — replaced cache with typeHash + eqType short-circuit upstream (different mechanism, but also "no cache
  lookup"):
  - full -2.8%, prelude +7.1%, others mixed.
  - Same shape as skip-redundant-funapp-unify Phase 2 (prelude +6.4%).

  So on the post-type-hash branch:
  - Drop entirely: +24% full.
  - Replace with eqType upstream guards: regresses prelude.
  - Keep as-is: optimal across all four scenarios.

  The cache pays for itself on full (39% hit rate amortises Hashable cost) and especially on prelude (cascade re-uses the same trivial
  pairs across 1342 modules — single HashSet bucket walk vs. re-walking via eqType).

  Pre-type-hash (799e8208, S.Set ordered cache) we never measured a clean drop, but the lookups are already O(1) on tiny constants so
  there's nothing to win there either — skip-redundant-funapp-unify Phase 1 was neutral.

  Conclusion captured in LESSONS.md: the cache-plus-Hashable infrastructure is essentially optimal for what it does; the path forward
  is elsewhere (e.g. compareType at 7.7%, attacked in type-hash).

kozak and others added 3 commits April 25, 2026 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

type-hash: HashSet for unificationCache (-15.4% full build on pr-admin)#16

type-hash: HashSet for unificationCache (-15.4% full build on pr-admin)#16
kozak wants to merge 3 commits into
restaumaticfrom
type-hash

kozak commented Apr 25, 2026

Uh oh!

zyla commented Apr 27, 2026

Uh oh!

zyla commented Apr 27, 2026

Uh oh!

kozak commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kozak commented Apr 25, 2026

Summary

Final results (vs baseline 799e8208)

Two non-obvious implementation details

Test plan

Uh oh!

zyla commented Apr 27, 2026

Uh oh!

zyla commented Apr 27, 2026

Uh oh!

kozak commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Final results (vs baseline `799e8208`)

kozak commented Apr 28, 2026 •

edited

Loading