Skip to content

perf: avoid O(N^2) exiting-branch checks in CodeFolding#8599

Open
Changqing-JING wants to merge 1 commit intoWebAssembly:mainfrom
Changqing-JING:opt/compile-speed
Open

perf: avoid O(N^2) exiting-branch checks in CodeFolding#8599
Changqing-JING wants to merge 1 commit intoWebAssembly:mainfrom
Changqing-JING:opt/compile-speed

Conversation

@Changqing-JING
Copy link
Copy Markdown
Contributor

@Changqing-JING Changqing-JING commented Apr 14, 2026

Follow up PR of #8586 to optimize CodeFolding

optimizeTerminatingTails calls EffectAnalyzer per tail item, each walking the full subtree. On deeply nested blocks this is O(N^2).

Replace the per-item walks with a single O(N) bottom-up PostWalker (populateExitingBranchCache) that pre-computes exiting-branch results for every node, making subsequent lookups O(1).

Example: AssemblyScript GC compiles __visit_members as a br_table dispatch over all types, producing ~N nested blocks with ~N tails. The old code walks each tail's subtree separately -- O(N^2) total node visits. With this change, one bottom-up walk covers all nodes, then each tail lookup is O(1).

(block $A          ;; depth 4000
  (block $B        ;; depth 3999
    (block $C      ;; depth 3998
      ...
      (br_table $A $B $C ... (local.get $rtid))
    )
    (unreachable)  ;; tail at depth 3999, old code walks 3999 nodes
  )
  (unreachable)    ;; tail at depth 4000, old code walks 4000 nodes
)

benchmark data
The test module is from issue #7319
#7319 (comment)

In main head

time ./build/bin/wasm-opt -Oz --enable-bulk-memory --enable-multivalue --enable-reference-types --enable-gc --enable-tail-call --enable-exception-handling  -o /dev/null ./test3.wasm

real    9m16.111s
user    35m33.985s
sys     0m51.000s

In the PR

time ./build/bin/wasm-opt -Oz --enable-bulk-memory --enable-multivalue --enable-reference-types --enable-gc --enable-tail-call --enable-exception-handling  -o /dev/null ./test3.wasm

real    5m17.170s
user    30m9.198s
sys     0m28.030s

@Changqing-JING Changqing-JING requested a review from a team as a code owner April 14, 2026 03:38
@Changqing-JING Changqing-JING requested review from kripken and removed request for a team April 14, 2026 03:38
@Changqing-JING Changqing-JING marked this pull request as draft April 14, 2026 03:38
}
// Pre-populate the cache once at the top level so all subsequent
// exitingBranchCache_ lookups are O(1).
if (num == 0) {
Copy link
Copy Markdown
Member

@kripken kripken Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We are called more than once with num == 0, so I think this is doing more work than needed? (there are three calls to this, two with num == 0 as the default value)
  2. We may also not end up needing the cache at all, if other issues stop us earlier.
  3. We will also only need the cache for some expressions, not the entire function.

To fix those issues, how about making new line 702 call a function that checks for external break targets. That function would lazily populate a cache internally, that is, given a specific expression it would compute it and cache results for that expression and all children (avoiding walking children already in the cache).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants