type-hash: HashSet for unificationCache (-15.4% full build on pr-admin)#16
type-hash: HashSet for unificationCache (-15.4% full build on pr-admin)#16kozak wants to merge 3 commits into
Conversation
Adds an Int hash field to TypeFlags alongside the existing Word8 bits, computed at construction by combining children's cached hashes with a constructor-tag salt. Pattern synonym builders use new helpers (binaryNodeFlags, leafFlags1, etc.); each constructor has a distinct salt so e.g. TypeVar "x" and TypeLevelString "x" hash differently. Hashable instances added for the leaf payload types (PSString, Label, Names, WildcardData, ...). Hashable (Type a) and Hashable (Constraint a) expose the cached hash with O(1) cost. UNPACK pragma on the strict TypeFlags field of every Type constructor (and on its inner Word8/Int) is essential — without it, Type allocations go through an extra boxed object and the build time more than doubles. With UNPACK + INLINE on the helpers, all four scenarios on pr-admin are neutral or near-neutral vs baseline 799e820: full +1.8% (58231-58906 ms; tight) nochange -2.3% (552-596 ms; tight) prelude +0.3% (3956-5477 ms; ~noise) leaf +3.2% (1612-8728 ms; very noisy) No behaviour change yet — eqType still structural, no HashSet conversion. The hash field is the foundation for steps 2 & 3. Ord Type is deliberately untouched (hash-based ordering would change observable iteration order — see LESSONS.md rust-interning Phase 2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 3 caps off type-hash by switching unificationCache from
Set (Type, Type) to HashSet (Type, Type), using the Hashable Type
instance step 1 enabled. This drops compareType from the unification
cache hot path: HS.member/HS.insert run in O(1) hash + O(1) eqType
(collision-free) vs the old O(log n) compareType.
The {-# INLINE #-} pragmas on the Hashable Type instance methods are
load-bearing — without them GHC dispatches through the class
dictionary at every hash call and HashSet measures *worse* than Set
(+102% on full). Once specialized, the win is clear.
Also picked up in this commit (originally from step 2 work):
- typeHash and typeBits direct accessors that pattern-match Type and
return the unboxed Int/Word8 directly, avoiding the TypeFlags box
the UNPACK pragma forces at every typeFlags call site.
- {-# INLINE #-} on typeFlags itself.
Final results on pr-admin (median of 4, vs baseline 799e820):
full -15.4% (47515-49155 ms; tight)
nochange -1.9% (564-594 ms; tight)
prelude -0.3% (~noise)
leaf -1.1% (very noisy, within range)
Step 2 (eqType hash short-circuit) was tried and reverted: most hot
eqType callers (e.g. the Entailment.hs:295 guard) compare equal types,
so the hash check just adds work. The bigger win was always going to
be eliminating the Ord-Type code path entirely via HashSet.
Ord Type is deliberately unchanged — see EXPERIMENT.md scope: a hash-
based ordering would change observable iteration order and risk the
rust-interning Phase 2 trap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten the Types.hs API so the per-node hash invariant (constructed via pattern synonyms, mutated only via flag-preserving helpers) is enforced by the module boundary instead of by convention. - Add `modifyFlags :: (TypeFlags -> TypeFlags) -> Type a -> Type a` for callers that need to set a processing flag bit without recomputing the cached hash. Marked INLINE so case-of-known-constructor fuses pattern-synonym construction + flag mutation into a single allocation. - Refactor `replaceAllTypeSynonyms'` in Synonyms.hs to use pattern synonyms for matching/construction and `modifyFlags` for the synonym-free marker. No longer references the underscore-suffixed data constructors directly. - Drop the `module Language.PureScript.Types` self-export and switch to an explicit list. The underscore-suffixed constructors (`TUnknown_`, `TypeVar_`, …) are no longer exported, so it's structurally impossible for downstream code to construct a Type with a wrong cached hash. - Simplify the node-flag helpers (rconsNodeFlags, binaryNodeFlags, ternaryNodeFlags, unaryNodeFlags, constraintNodeFlags) to use uniform `hashWithSalt` chains. `mixHash` removed. - Drop now-dead helpers (`noFlags`, `combineFlags`, `maskStructural`, `typeBits`, `leafFlags2`). stack test --fast: 1340 examples, 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wow. This suggests there are probably more gains to be had by removing other indirections from For example (not suggesting we do that, it's crazy, but cool): in Zig they looked which type constructor patterns are most frequent and promoted them to top-level constructors - kinda like Huffman compression but for trees. https://youtu.be/IroPQ150F6c?si=04SHZMYFiye-B7je&t=1567 - "encoding" approach. https://youtu.be/IroPQ150F6c?si=zufBypPxj1mwR1jy&t=2200 (same talk but later) - applied to Zig compiler AST |
|
btw, now I wonder: do we still need the unification cache with skip-redundant-entailment-unify ? With it, unification might be close to eqType in performance for common cases (note: I haven't measured) |
|
Claude checked something similar to what you are mentioning https://github.com/restaumatic/purescript/blob/restaumatic/experiments/LESSONS.md#the-unification-cache-is-a-hash-equal-memoizer-unify-pattern-survey . TLDR: it seems we do. |
Summary
Inthash on everyTypenode alongside the existingTypeFlags, computed at construction by combining children's cached hashes with a per-constructor salt. AddsHashable (Type a)(O(1) via the cached field) andHashable (Constraint a).unificationCacheinUnify.hs:121–123fromSet (Type, Type)toHashSet (Type, Type). This dropscompareTypefrom the unification cache hot path entirely.Ord Typeis deliberately unchanged — see the rust-interning Phase 2 trap inLESSONS.md.compareTypewas the new top hotspot at 7.7% post-merges (after synonym-opt + skip-redundant-entailment-unify shipped). The unification cache is the largest single source.Final results (vs baseline
799e8208)Two non-obvious implementation details
Both look like minor codegen hints but each is a 2× swing on full builds when missed. Documented in
experiments/LESSONS.md:{-# UNPACK #-}on the!TypeFlagsfield of everyTypeconstructor.TypeFlagswent fromnewtype(zero-cost wrapper) to a 2-field record; without UNPACK every Type allocation indirects through an extra boxed object — measured +107% on full. With UNPACK, the regression collapses to ~+1.8%.{-# INLINE #-}on theHashable Typeinstance methods. Without it GHC dispatches through the class dictionary at every hash call, allocates thunks, and the supposed HashSet win measures worse than Set (+102% on full). With INLINE the dictionary collapses through to a direct field read and the headline win drops out.A
eqTypehash short-circuit (step 2) was tried and reverted — the hot callers (Entailment.hs:295 guard) compare equal types most of the time, so the hash check just adds work. See LESSONS.md.Test plan
stack test --fast: 1340 examples, 0 failuresOrd Typechanges — iteration order preserved (rust-interning Phase 2 trap avoided)combineNodeFlags-style helpers and per-constructor salts inTypes.hs🤖 Generated with Claude Code