Skip to content

type-hash: HashSet for unificationCache (-15.4% full build on pr-admin)#16

Open
kozak wants to merge 3 commits into
restaumaticfrom
type-hash
Open

type-hash: HashSet for unificationCache (-15.4% full build on pr-admin)#16
kozak wants to merge 3 commits into
restaumaticfrom
type-hash

Conversation

@kozak
Copy link
Copy Markdown

@kozak kozak commented Apr 25, 2026

Summary

  • Step 1: Cache a structural Int hash on every Type node alongside the existing TypeFlags, computed at construction by combining children's cached hashes with a per-constructor salt. Adds Hashable (Type a) (O(1) via the cached field) and Hashable (Constraint a).
  • Step 3: Switch the unificationCache in Unify.hs:121–123 from Set (Type, Type) to HashSet (Type, Type). This drops compareType from the unification cache hot path entirely.
  • Measured -15.4% on full builds of pr-admin (median of 4, 47515-49155 ms; tight). Neutral on the three incremental scenarios.
  • Ord Type is deliberately unchanged — see the rust-interning Phase 2 trap in LESSONS.md.

compareType was the new top hotspot at 7.7% post-merges (after synonym-opt + skip-redundant-entailment-unify shipped). The unification cache is the largest single source.

Final results (vs baseline 799e8208)

Scenario Base (s) Head (s) Δ Notes
full 57.0 48.2 -15.4% tight, 47515-49155 ms
nochange 0.6 0.6 -1.9% tight, 564-594 ms
prelude 4.0 4.0 -0.3% within noise
leaf 1.6 1.6 -1.1% very noisy (1550-8576 ms)

Two non-obvious implementation details

Both look like minor codegen hints but each is a 2× swing on full builds when missed. Documented in experiments/LESSONS.md:

  1. {-# UNPACK #-} on the !TypeFlags field of every Type constructor. TypeFlags went from newtype (zero-cost wrapper) to a 2-field record; without UNPACK every Type allocation indirects through an extra boxed object — measured +107% on full. With UNPACK, the regression collapses to ~+1.8%.
  2. {-# INLINE #-} on the Hashable Type instance methods. Without it GHC dispatches through the class dictionary at every hash call, allocates thunks, and the supposed HashSet win measures worse than Set (+102% on full). With INLINE the dictionary collapses through to a direct field read and the headline win drops out.

A eqType hash short-circuit (step 2) was tried and reverted — the hot callers (Entailment.hs:295 guard) compare equal types most of the time, so the hash check just adds work. See LESSONS.md.

Test plan

  • stack test --fast: 1340 examples, 0 failures
  • All four scenarios on pr-admin: full / nochange / prelude / leaf
  • No Ord Type changes — iteration order preserved (rust-interning Phase 2 trap avoided)
  • Reviewer eyeballs the new combineNodeFlags-style helpers and per-constructor salts in Types.hs

🤖 Generated with Claude Code

kozak and others added 3 commits April 25, 2026 10:56
Adds an Int hash field to TypeFlags alongside the existing Word8 bits,
computed at construction by combining children's cached hashes with a
constructor-tag salt. Pattern synonym builders use new helpers
(binaryNodeFlags, leafFlags1, etc.); each constructor has a distinct
salt so e.g. TypeVar "x" and TypeLevelString "x" hash differently.
Hashable instances added for the leaf payload types (PSString, Label,
Names, WildcardData, ...). Hashable (Type a) and Hashable (Constraint a)
expose the cached hash with O(1) cost.

UNPACK pragma on the strict TypeFlags field of every Type constructor
(and on its inner Word8/Int) is essential — without it, Type allocations
go through an extra boxed object and the build time more than doubles.
With UNPACK + INLINE on the helpers, all four scenarios on pr-admin are
neutral or near-neutral vs baseline 799e820:

  full     +1.8%  (58231-58906 ms; tight)
  nochange -2.3%  (552-596 ms; tight)
  prelude  +0.3%  (3956-5477 ms; ~noise)
  leaf     +3.2%  (1612-8728 ms; very noisy)

No behaviour change yet — eqType still structural, no HashSet
conversion. The hash field is the foundation for steps 2 & 3.

Ord Type is deliberately untouched (hash-based ordering would change
observable iteration order — see LESSONS.md rust-interning Phase 2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 3 caps off type-hash by switching unificationCache from
Set (Type, Type) to HashSet (Type, Type), using the Hashable Type
instance step 1 enabled. This drops compareType from the unification
cache hot path: HS.member/HS.insert run in O(1) hash + O(1) eqType
(collision-free) vs the old O(log n) compareType.

The {-# INLINE #-} pragmas on the Hashable Type instance methods are
load-bearing — without them GHC dispatches through the class
dictionary at every hash call and HashSet measures *worse* than Set
(+102% on full). Once specialized, the win is clear.

Also picked up in this commit (originally from step 2 work):
- typeHash and typeBits direct accessors that pattern-match Type and
  return the unboxed Int/Word8 directly, avoiding the TypeFlags box
  the UNPACK pragma forces at every typeFlags call site.
- {-# INLINE #-} on typeFlags itself.

Final results on pr-admin (median of 4, vs baseline 799e820):

  full     -15.4%  (47515-49155 ms; tight)
  nochange  -1.9%  (564-594 ms; tight)
  prelude   -0.3%  (~noise)
  leaf      -1.1%  (very noisy, within range)

Step 2 (eqType hash short-circuit) was tried and reverted: most hot
eqType callers (e.g. the Entailment.hs:295 guard) compare equal types,
so the hash check just adds work. The bigger win was always going to
be eliminating the Ord-Type code path entirely via HashSet.

Ord Type is deliberately unchanged — see EXPERIMENT.md scope: a hash-
based ordering would change observable iteration order and risk the
rust-interning Phase 2 trap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten the Types.hs API so the per-node hash invariant (constructed
via pattern synonyms, mutated only via flag-preserving helpers) is
enforced by the module boundary instead of by convention.

- Add `modifyFlags :: (TypeFlags -> TypeFlags) -> Type a -> Type a` for
  callers that need to set a processing flag bit without recomputing
  the cached hash. Marked INLINE so case-of-known-constructor fuses
  pattern-synonym construction + flag mutation into a single allocation.
- Refactor `replaceAllTypeSynonyms'` in Synonyms.hs to use pattern
  synonyms for matching/construction and `modifyFlags` for the
  synonym-free marker. No longer references the underscore-suffixed
  data constructors directly.
- Drop the `module Language.PureScript.Types` self-export and switch
  to an explicit list. The underscore-suffixed constructors
  (`TUnknown_`, `TypeVar_`, …) are no longer exported, so it's
  structurally impossible for downstream code to construct a Type with
  a wrong cached hash.
- Simplify the node-flag helpers (rconsNodeFlags, binaryNodeFlags,
  ternaryNodeFlags, unaryNodeFlags, constraintNodeFlags) to use uniform
  `hashWithSalt` chains. `mixHash` removed.
- Drop now-dead helpers (`noFlags`, `combineFlags`, `maskStructural`,
  `typeBits`, `leafFlags2`).

stack test --fast: 1340 examples, 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zyla
Copy link
Copy Markdown
Collaborator

zyla commented Apr 27, 2026

without UNPACK every Type allocation indirects through an extra boxed object — measured +107% on full.

Wow. This suggests there are probably more gains to be had by removing other indirections from Type. Or optimizing the layout in some other way. Pattern synonyms let us play some tricks.

For example (not suggesting we do that, it's crazy, but cool): in Zig they looked which type constructor patterns are most frequent and promoted them to top-level constructors - kinda like Huffman compression but for trees.

https://youtu.be/IroPQ150F6c?si=04SHZMYFiye-B7je&t=1567 - "encoding" approach. https://youtu.be/IroPQ150F6c?si=zufBypPxj1mwR1jy&t=2200 (same talk but later) - applied to Zig compiler AST

@zyla
Copy link
Copy Markdown
Collaborator

zyla commented Apr 27, 2026

btw, now I wonder: do we still need the unification cache with skip-redundant-entailment-unify ? With it, unification might be close to eqType in performance for common cases (note: I haven't measured)

@kozak
Copy link
Copy Markdown
Author

kozak commented Apr 28, 2026

Claude checked something similar to what you are mentioning https://github.com/restaumatic/purescript/blob/restaumatic/experiments/LESSONS.md#the-unification-cache-is-a-hash-equal-memoizer-unify-pattern-survey . TLDR: it seems we do.

unify-cache Phase 2 (43f6b613, post-type-hash HashSet cache) — one-line drop (unifyTypes'' = unifyTypes'):
  - Full build: 51.8s → 64.3s (+24%)
  - Hit rate 39.3% (412k / 1.05M calls) — the cache saves ~12s for the ~10s of Hashable overhead it adds. Net positive ~12s.

  unify-pattern-survey Phase 2 — replaced cache with typeHash + eqType short-circuit upstream (different mechanism, but also "no cache
  lookup"):
  - full -2.8%, prelude +7.1%, others mixed.
  - Same shape as skip-redundant-funapp-unify Phase 2 (prelude +6.4%).

  So on the post-type-hash branch:
  - Drop entirely: +24% full.
  - Replace with eqType upstream guards: regresses prelude.
  - Keep as-is: optimal across all four scenarios.

  The cache pays for itself on full (39% hit rate amortises Hashable cost) and especially on prelude (cascade re-uses the same trivial
  pairs across 1342 modules — single HashSet bucket walk vs. re-walking via eqType).

  Pre-type-hash (799e8208, S.Set ordered cache) we never measured a clean drop, but the lookups are already O(1) on tiny constants so
  there's nothing to win there either — skip-redundant-funapp-unify Phase 1 was neutral.

  Conclusion captured in LESSONS.md: the cache-plus-Hashable infrastructure is essentially optimal for what it does; the path forward
  is elsewhere (e.g. compareType at 7.7%, attacked in type-hash).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants