Skip to content

Eliminate Rx Merge gate in queue-serialized operators#1097

Open
dwcullop wants to merge 8 commits into
reactivemarbles:mainfrom
dwcullop:fix/operator-merge-gate-deadlock
Open

Eliminate Rx Merge gate in queue-serialized operators#1097
dwcullop wants to merge 8 commits into
reactivemarbles:mainfrom
dwcullop:fix/operator-merge-gate-deadlock

Conversation

@dwcullop
Copy link
Copy Markdown
Member

Problem

PR #1079 moved cross-cache operators from Synchronize(lock) to SynchronizeSafe, which routes every notification through a SharedDeliveryQueue that releases the lock before invoking downstream observers. The goal was that no operator-level lock would ever be held across a cross-cache call, so two operators on a bidirectional pipeline could not form an ABBA cycle.

Six operators completed the queue routing but then combined their already-serialized inputs with Observable.Merge before delivery:

  • Page
  • Virtualise
  • AutoRefresh
  • Sort (the conditional branch when comparerChangedObservable or resorter is present)
  • GroupOnImmutable
  • QueryWhenChanged (the itemChangedTrigger branch)

Rx's Observable.Merge installs a private _gate and holds it for the full duration of every downstream OnNext. When downstream delivery walks into another cache's writer lock, the two Merge gates on the two operators reconstruct the ABBA cycle that the queue-drain design was supposed to eliminate.

DeadlockTortureTest.Page_DoesNotDeadlock (added in #1079) caught this for Page as an intermittent CI failure. The other five operators have the same latent bug; the existing torture test does not exercise their merge branches with cross-cache writes.

Fix

Add IObservable<T>.UnsynchronizedMerge, a drop-in alternative to Observable.Merge that performs no synchronization of its own. It preserves Merge's terminal semantics (completes only after every source completes; first error terminates; subscription happens in argument order) but does not install a gate.

UnsynchronizedMerge is safe to use only when every input is already serialized. In this library that precondition is satisfied by routing each input through the same SharedDeliveryQueue via SynchronizeSafe(queue) before the merge. The queue's drain loop guarantees that at most one notification is in flight to the shared observer at a time, so the additional gate that Observable.Merge would install is redundant.

All six operators above are rewritten to use UnsynchronizedMerge. The pattern at every call site is unchanged except for the method name:

// Before
request.Merge(dataChange).Where(...).SubscribeSafe(observer)

// After
request.UnsynchronizedMerge(dataChange).Where(...).SubscribeSafe(observer)

Sort's three-source case becomes a single UnsynchronizedMerge call with two params arguments instead of nested .Merge().Merge(), which removes one of the two gates the chained form created.

Why FullJoin is not changed

FullJoin uses the same Merge syntax but its two inputs come from leftCache.Connect() and rightCache.Connect() on independently materialized AsObservableCache() stages that share no queue. There, the Merge gate is the only thing serializing the two cache deliveries before they mutate joinedCache. Removing it without alternative serialization would race the joined cache. FullJoin is left alone.

Test coverage

DeadlockTortureTest is expanded so the same fixture catches a future regression in any of the six operators:

  • New [Fact] GroupWithImmutableState_DoesNotDeadlock.
  • New [Fact] QueryWhenChanged_DoesNotDeadlock — uses a side-channel .Subscribe(_ => otherCache.AddOrUpdate(...)) to close the ABBA cycle, since QueryWhenChanged does not produce a changeset that PopulateInto can consume.
  • AllDangerous_Stacked_DoNotDeadlock now stacks GroupWithImmutableState and Virtualise into the existing kitchen-sink pipeline.
  • MultiplePairs_Simultaneous_NoDeadlock gains a GroupWithImmutableState lane.

Operators with their own existing standalone tests in the fixture (AutoRefresh, Page, Sort, Virtualise) are already covered.

Verification

  • DeadlockTortureTest fixture: 14/14 pass at xUnit.MaxParallelThreads=16, 10 consecutive runs, zero failures.
  • Targeted unit tests for Sort + Virtualise + Page + AutoRefresh + Group* + QueryWhenChanged: 422/422 pass.
  • Full test suite at xUnit.MaxParallelThreads=4: 2321 passed, 0 failed, 1 skipped.

Darrin Cullop added 8 commits May 26, 2026 17:21
reactivemarbles#1079 moved cross-cache operators from Synchronize(lock) to SynchronizeSafe,
which routes deliveries through a SharedDeliveryQueue that releases the lock
before invoking downstream observers. The intent was to make the lock no
longer held across cross-cache calls, so two operators on a bidirectional
pipeline could not form an ABBA cycle.

Six operators (Page, Virtualise, AutoRefresh, Sort, GroupOnImmutable, and
QueryWhenChanged) routed every input through the queue but then combined the
inputs with Observable.Merge before delivery. Rx's Merge installs its own
private gate and holds it for the full duration of every downstream OnNext.
When downstream delivery walks into another cache's writer lock, the two
Merge gates on the two operators reconstruct the ABBA cycle that the queue-
drain design was supposed to eliminate. DeadlockTortureTest.Page_DoesNotDeadlock
caught this for Page; the other five had the same latent bug.

This adds IObservable<T>.UnsynchronizedMerge, a drop-in alternative to
Observable.Merge that performs no synchronization of its own. It is safe to
use only when every input is already serialized (in this library, by routing
through the same SharedDeliveryQueue). All six operators now use it.

Sort's three-source case becomes a single UnsynchronizedMerge call instead of
nested .Merge().Merge(), removing one of the two gates that the chained form
created.

FullJoin uses the same Merge syntax but its two inputs come from independently
materialized AsObservableCache().Connect() streams that share no queue. The
Merge gate is the only thing serializing them; this PR leaves FullJoin alone.

DeadlockTortureTest grows three new cases (GroupWithImmutableState, QueryWhenChanged,
and Virtualise added to the stacked + multi-pair scenarios) so a future regression
in any of the six operators is caught by the existing torture fixture.

Verified: 14/14 DeadlockTortureTest pass at MaxParallelThreads=16 across 10
iterations; 422/422 Sort/Virtualise/Page/AutoRefresh/Group/QueryWhenChanged
unit tests pass; full Cache + List suite passes (2321 passed, 1 skipped).
Initial implementation subscribed every source to a single shared
Observer.Create instance. The instance is an AnonymousObserver, which
derives from ObserverBase and tracks a one-shot _isStopped flag inside
its OnCompleted/OnError. Once any source's terminal notification flips
that flag, every subsequent OnCompleted from the remaining sources is
silently dropped before reaching the pending counter, so the merged
observable never emits OnCompleted downstream.

CrossCacheDeadlockStressTest.AllOperators_CrossCache_NoDeadlock_CorrectResults
caught this consistently in CI: the sourceB.Sort.Virtualise pipeline
received OnCompleted from virtBRequests (its first source), but the
matching OnCompleted from sourceB.Dispose arrived at a stopped observer
and was discarded, leaving LastOrDefaultAsync waiting forever.

Each source now subscribes through its own Observer.Create instance.
The OnNextSafe/OnErrorSafe/OnCompletedSafe actions close over the same
shared pending and terminated counters, so the all-must-complete and
first-error-wins semantics are unchanged; only the per-observer one-shot
state is now isolated per source. This matches the per-InnerObserver
pattern that Rx's own Observable.Merge uses internally.

Also apply UnsynchronizedMerge to TransformWithForcedTransform, which
was missed in the original survey. Its shared.Merge(refresher) routed
both inputs through the same SharedDeliveryQueue but kept Rx's gate,
giving the same latent ABBA exposure that DeadlockTortureTest.TransformWithForce_DoesNotDeadlock
flagged in CI.

Verified: CrossCacheDeadlockStressTest plus the full DeadlockTortureTest
fixture pass 10/10 at xUnit.MaxParallelThreads=16; full test suite
passes 2323/2323 at xUnit.MaxParallelThreads=4.
Six of the operators changed in this branch followed the same shape:

    var queue = new SharedDeliveryQueue();
    var s1 = source1.SynchronizeSafe(queue).Select(projection1);
    var s2 = source2.SynchronizeSafe(queue).Select(projection2);
    return new CompositeDisposable(s1.UnsynchronizedMerge(s2)... , queue);

Every site allocates its own queue, threads it through each input, and
unwinds it in the disposable. The pattern is mechanical and easy to get
wrong: the queue must outlive the subscription, every input must be
serialized through the same queue, and the merge must skip Rx's gate.

DeliveryQueueMerge wraps that pattern as one operator. Each overload
owns its own SharedDeliveryQueue, routes every input through it via
SynchronizeSafe(queue), and combines the serialized streams with
UnsynchronizedMerge. The returned disposable tears down the merge
before the queue so terminal notifications still flow through the
still-active queue.

Two flavours:

  DeliveryQueueMerge(IObservable<T>, params IObservable<T>[])
      same-type merge, no projection (AutoRefresh)
  DeliveryQueueMerge(IObservable<T1>, Func<T1,TOut>, IObservable<T2>, Func<T2,TOut>)
      heterogeneous two-source merge with projections invoked inside the drain
      (Page, Virtualise, GroupOnImmutable, QueryWhenChanged)
  DeliveryQueueMerge(IObservable<T1>, ..., IObservable<T2>, ..., IObservable<T3>, ...)
      three-source heterogeneous merge (Sort, non-early-return branch)

TransformWithForcedTransform keeps its current shape: its queue is shared
with a Publish()/cacheLoader subscription that lives outside the merge,
so the queue cannot be encapsulated by a merge operator. UnsynchronizedMerge
remains the helper there.

Verified locally: 437/437 unit tests across the six affected operators pass;
DeadlockTortureTest plus CrossCacheDeadlockStressTest pass 10/10 at
xUnit.MaxParallelThreads=16; full test suite passes at MaxParallelThreads=4.
The heterogeneous DeliveryQueueMerge overloads pushed too much into
each call site to read like idiomatic Rx, and at five of the six
operators the projections had to run inside the shared delivery queue
to preserve Rx semantics, which the operator-level signature could not
express without exposing the queue type to the caller.

Keep the same-type extension overload only:

    public static IObservable<T> DeliveryQueueMerge<T>(
        this IObservable<T> first,
        params IObservable<T>[] others)

This reads as a drop-in for Observable.Merge at AutoRefresh's call
site, which is the only place all inputs are already the same type
and need no per-input projection inside the drain.

Page, Virtualise, Sort, GroupOnImmutable, and QueryWhenChanged keep
the explicit SharedDeliveryQueue + SynchronizeSafe(queue) + UnsynchronizedMerge
shape introduced earlier in this branch. Each call site shows the
queue plumbing because the projections must execute inside the drain;
making that visible matches the rest of the code in the file.
Tests with subject inputs (Page, Virtualise, BatchIf, TransformWithForce,
AllDangerous_Stacked, MultiplePairs) created the subject but nothing ever
called OnNext on it. The bidirectional source writes still flowed through
the operator's Merge gate, so the original deadlock was triggered, but
the operator's subject-driven branch (refresher, request changes, pause
toggle) was never invoked during the race. A regression that broke only
that branch would not be caught.

Add an optional subjectPusher callback to RunBidirectionalDeadlockTest
that runs on a third worker thread, gated by the same Barrier as the two
writer threads, and have each subject-bearing test push its own pattern
on the subject while sources are writing. For the Page/Virtualise/BatchIf
inline subjects in MultiplePairs, lift them to named locals so they can
be referenced from the pusher closure.

Also collapse the vertical layout introduced in the previous commits for
DeliveryQueueMerge's CompositeDisposable construction and the
UnsynchronizedMerge OnCompleted predicate.
Every input has the same element type T, so the type-erased
SharedDeliveryQueue with its per-source DeliverySubQueue<T> wrappers
was carrying machinery (bitset, sub-queue list, type-erased StageNext/
DeliverStaged dispatch) that the same-type merge never used.

Replace the implementation with one DeliveryQueue<T> and per-source
Observer.Create instances:

  - OnNext: forwarded directly to queue.OnNext. The queue's gate
    serializes concurrent calls from multiple producers; the drain
    delivers items in arrival order outside the lock, so a downstream
    observer that walks into another cache's writer lock cannot
    deadlock with this serialization point.

  - OnError: forwarded directly to queue.OnError. The queue marks
    itself terminated at the first error reaching the drain, so a
    second concurrent error from another source is dropped at enqueue
    and the downstream observer sees OnError exactly once.

  - OnCompleted: counter-gated; only the last surviving source's
    completion calls queue.OnCompleted, matching Observable.Merge's
    all-must-complete semantic. If a source has already errored, the
    queue is terminated and the eventual OnCompleted at the counter's
    floor is dropped at enqueue.

The per-source Observer.Create instance is required for the same
reason it is in UnsynchronizedMerge: Rx's ObserverBase sets a one-shot
stopped flag on the first OnCompleted/OnError, and a single shared
observer would silently drop terminal notifications from every source
after the first.

AutoRefresh is the only consumer of DeliveryQueueMerge. All tests
across AutoRefresh, DeadlockTortureTest, and CrossCacheDeadlockStressTest
pass; deadlock fixture passes 5/5 at xUnit.MaxParallelThreads=16.
PR build failed AllDangerous_Stacked_DoNotDeadlock after 27s on a single
iteration (the per-iteration TimeoutSeconds=15 budget was exceeded, then
RunBidirectionalDeadlockTest returned false). It was not a deadlock; the
pipeline was just doing too much work.

Each force.OnNext in this test triggers TransformWithForcedTransform's
refresher, which scans cache.KeyValues and emits a refresh changeset
that flows through the full 9-operator stack (GroupWithImmutableState,
TransformMany, AutoRefresh, Filter, Transform, OnItemRemoved, DisposeMany,
Sort, Virtualise, Page). At ItemCount=200 pusher iterations with three
subjects pushed per iteration (force, pageReq, virtReq), the pusher
thread did ~600 push operations per iteration on top of the two writer
threads' 200 source AddOrUpdates each. The other torture tests have a
single-operator pipeline and one pusher and fit well within the budget;
only the stacked case combines a heavy pipeline with three concurrent
pushers.

Reduce StackedPushCount to ItemCount/4 = 50, three subjects each. That
keeps the subject branches under contention (still 150 pushes per
iteration, still well above source-write rate) while bringing each
iteration's worst case comfortably under TimeoutSeconds. The other
subject-bearing tests are unchanged.
Previous commit reduced the AllDangerous_Stacked pusher load to fit
the 15s per-iteration budget on the CI runner. That was the wrong
trade: the test is a torture test, and shaving load to match the
slowest hardware costs coverage. The CI runners are deliberately
stripped down; the test budget should account for them.

Raise TimeoutSeconds from 15 to 60 across the fixture and restore the
full ItemCount pusher loop in AllDangerous_Stacked. The timeout still
catches an actual deadlock (which hangs forever, not 60s), and the
extra budget covers worst-case scheduling on a small VM.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant