Conversation
Test Results 40 files + 4 40 suites +4 24m 47s ⏱️ + 18m 37s For more details on these failures, see this check. Results for commit e561175. ± Comparison against base commit f6c2dea. This pull request removes 297 and adds 1495 tests. Note that renamed tests count towards both.This pull request removes 7 skipped tests and adds 5 skipped tests. Note that renamed tests count towards both.This pull request skips 1 and un-skips 5 tests.♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Pull request overview
This PR bundles several long-running feature and stability tracks across MeshWeaver core + Memex: social publishing foundations, in-process #r "nuget:..." compilation support (node-type + interactive markdown), move-operation performance/timeout hardening, and multiple UI/stream reliability improvements. It also standardizes the code folder naming from _Source/_Test to Source/Test across code, tests, docs, and samples.
Changes:
- Introduces
MeshWeaver.Social(options, DI wiring, publish queue, credential model) plus initial Memex wiring (LinkedIn connect entry points + user menu hooks). - Adds
MeshWeaver.NuGetresolver + directive parser and integrates it into script compilation (#r "nuget:Pkg, Version"), including cache backends and tests. - Improves operational robustness: parallelized recursive moves, default 30s mesh-op timeout, “no endless spinner” navigation status UI, and remote stream resubscribe behavior.
Reviewed changes
Copilot reviewed 159 out of 265 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/MeshWeaver.StorageImport.Test/StorageImporterTests.cs | Updates test expectations/docs to Source/ naming. |
| test/MeshWeaver.Social.Test/PostStatsRefresherTest.cs | Adds stats refresher test coverage (needs deterministic timeout handling). |
| test/MeshWeaver.Social.Test/MeshWeaver.Social.Test.csproj | Adds new Social test project referencing Social + Fixture. |
| test/MeshWeaver.Social.Test/InMemoryPublishQueueTest.cs | Adds unit tests for publish queue due-drain + dedup. |
| test/MeshWeaver.Persistence.Test/FileSystemPersistenceTest.cs | Updates partition tests to Source/ naming. |
| test/MeshWeaver.MathDemo.Test/TestPaths.cs | Adds helper paths for MathDemo sample test assets. |
| test/MeshWeaver.MathDemo.Test/MeshWeaver.MathDemo.Test.csproj | Adds MathDemo test project and copies sample graph data to output. |
| test/MeshWeaver.Hosting.PostgreSql.Test/SatelliteQueryTests.cs | Updates code-path routing tests to Source/ naming. |
| test/MeshWeaver.Hosting.Monolith.Test/UserActivityAreaTest.cs | Updates regression test docs to Source/ naming. |
| test/MeshWeaver.Hosting.Blazor.Test/NavigationServiceTest.cs | Adjusts test to assert “no 404 flash” during retries. |
| test/MeshWeaver.Graph.Test/NuGetDirectiveParserTest.cs | Adds unit tests for parsing/stripping #r "nuget:...". |
| test/MeshWeaver.Graph.Test/NuGetAssemblyResolverTest.cs | Adds networked NuGet restore end-to-end tests (skippable via env var). |
| test/MeshWeaver.Graph.Test/MeshWeaver.Graph.Test.csproj | References new MeshWeaver.NuGet project. |
| test/MeshWeaver.FutuRe.Test/MeshWeaver.FutuRe.Test.csproj | Updates compile-included sample sources to Source/ paths. |
| test/MeshWeaver.Content.Test/CompilationErrorTest.cs | Updates broken-code test to Source/ path. |
| test/MeshWeaver.AI.Test/MeshPluginTest.cs | Updates MCP tool count expectations (adds RunTests/Move/Copy). |
| src/MeshWeaver.Social/SocialOptions.cs | Adds configurable knobs for publishing/stats/ingest scheduling. |
| src/MeshWeaver.Social/SocialExtensions.cs | Adds DI wiring for social publishing subsystem and hosted services. |
| src/MeshWeaver.Social/PlatformCredential.cs | Adds credential record model (access/refresh/expiry metadata). |
| src/MeshWeaver.Social/MeshWeaver.Social.csproj | Introduces Social library project. |
| src/MeshWeaver.Social/IPublishQueue.cs | Adds publish queue abstraction + in-memory implementation. |
| src/MeshWeaver.Social/IApprovalPublishBridge.cs | Defines bridge contract and PublishableSnapshot model. |
| src/MeshWeaver.NuGet/ResolvedPackageSet.cs | Adds resolver output model (assemblies, probing dirs, versions). |
| src/MeshWeaver.NuGet/NuGetServiceCollectionExtensions.cs | Adds DI extension to register resolver + cache. |
| src/MeshWeaver.NuGet/NuGetPackageReference.cs | Adds package reference model (id + version range). |
| src/MeshWeaver.NuGet/NuGetDirectiveParser.cs | Implements #r "nuget:..." extraction + source stripping. |
| src/MeshWeaver.NuGet/MeshWeaver.NuGet.csproj | Introduces NuGet resolver project and dependencies. |
| src/MeshWeaver.NuGet/INuGetPackageCache.cs | Adds optional persistent cache interface + null implementation. |
| src/MeshWeaver.NuGet/INuGetAssemblyResolver.cs | Adds resolver interface returning ResolvedPackageSet. |
| src/MeshWeaver.NuGet.AzureBlob/MeshWeaver.NuGet.AzureBlob.csproj | Adds Azure Blob cache backend project. |
| src/MeshWeaver.NuGet.AzureBlob/BlobNuGetPackageCacheExtensions.cs | Adds DI helper to register blob-backed cache. |
| src/MeshWeaver.Mesh.Contract/Services/MeshOperationOptions.cs | Adds mesh operation timeout options (default 30s). |
| src/MeshWeaver.Mesh.Contract/Services/IStorageAdapter.cs | Updates docs/examples to Source/ naming. |
| src/MeshWeaver.Mesh.Contract/Services/INavigationService.cs | Adds Status observable contract for UI progress reporting. |
| src/MeshWeaver.Mesh.Contract/Services/IIconGenerator.cs | Adds icon generator abstraction returning an observable SVG. |
| src/MeshWeaver.Mesh.Contract/PartitionDefinition.cs | Updates standard table mappings (Source/Test → code) and clarifies semantics. |
| src/MeshWeaver.Mesh.Contract/MeshExtensions.cs | Adds timeout override + move timeout enforcement + grain dispose on delete. |
| src/MeshWeaver.Mesh.Contract/CodeConfiguration.cs | Updates docs to Source/ naming. |
| src/MeshWeaver.Kernel.Hub/MeshWeaver.Kernel.Hub.csproj | Removes Interactive package mgmt dependency; references MeshWeaver.NuGet. |
| src/MeshWeaver.Hosting/Persistence/MigrationUtility.cs | Updates migration heuristics to include Source/Test + legacy _Source/_Test. |
| src/MeshWeaver.Hosting/Persistence/FileSystemStorageAdapter.cs | Treats Source/Test as code paths + keeps legacy compatibility. |
| src/MeshWeaver.Hosting/Persistence/FileSystemPersistenceService.cs | Parallelizes descendant move I/O (with concurrency implications). |
| src/MeshWeaver.Hosting/Persistence/CachingStorageAdapter.cs | Updates code sub-namespace detection (Source/Test + legacy). |
| src/MeshWeaver.Hosting.PostgreSql/PostgreSqlPartitionedStoreFactory.cs | Guards against source/test mistakenly becoming schemas. |
| src/MeshWeaver.Hosting.PostgreSql/PostgreSqlCrossSchemaQueryProvider.cs | Filters malformed parameters to avoid NRE during SQL interpolation. |
| src/MeshWeaver.Hosting.Blazor/MeshWeaver.Hosting.Blazor.csproj | Adds NU1510 suppression. |
| src/MeshWeaver.Graph/PartitionTypeSource.cs | Updates docs to Source/ naming. |
| src/MeshWeaver.Graph/MeshWeaver.Graph.csproj | References MeshWeaver.NuGet. |
| src/MeshWeaver.Graph/MeshNodeLayoutAreas.cs | Improves create href behavior + reactive/grouped children catalog. |
| src/MeshWeaver.Graph/MeshDataSource.cs | Updates docs to Source/ naming. |
| src/MeshWeaver.Graph/Configuration/ScriptCompilationService.cs | Integrates NuGet directive parsing + resolver into compilation. |
| src/MeshWeaver.Graph/Configuration/NodeTypeDefinition.cs | Updates docs/examples to Source/ naming. |
| src/MeshWeaver.Graph/Configuration/MeshDataSourceNodeType.cs | Changes sources namespace constant to Source. |
| src/MeshWeaver.Graph/Configuration/GraphConfigurationExtensions.cs | Registers NuGet resolver and uses Source code path. |
| src/MeshWeaver.Graph/Configuration/CodeNodeType.cs | Treats Code nodes as primary content; defines Source/Test constants. |
| src/MeshWeaver.Documentation/Data/DataMesh/UnifiedPath.md | Documents @/ semantics and HTML-href pitfalls. |
| src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Profile/Source/SocialMediaProfileLayoutAreas.cs | Adds SocialMedia profile layout areas example. |
| src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Profile/Source/SocialMediaProfile.cs | Adds SocialMedia profile content model example. |
| src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Post/Source/SocialMediaPost.cs | Adds SocialMedia post content model example. |
| src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Post/Source/Platform.cs | Adds SocialMedia platform reference-data example. |
| src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia.md | Updates docs to Source/ naming and authoring guidance. |
| src/MeshWeaver.Documentation/Data/DataMesh/SatelliteEntities.md | Clarifies Source/Test are primary content, not satellites. |
| src/MeshWeaver.Documentation/Data/DataMesh/NodeTypes.md | Adds Node Types documentation index page. |
| src/MeshWeaver.Documentation/Data/DataMesh/NodeTypeConfiguration.md | Updates docs to Source/ naming. |
| src/MeshWeaver.Documentation/Data/DataMesh/NodeOperations.md | Updates docs to Source/ naming. |
| src/MeshWeaver.Documentation/Data/DataMesh/DataConfiguration.md | Updates docs to Source/ naming. |
| src/MeshWeaver.Documentation/Data/DataMesh/CreatingNodeTypes.md | Updates docs to Source/Test naming throughout. |
| src/MeshWeaver.Documentation/Data/DataMesh.md | Updates TOC links and adds NuGet packages bullet. |
| src/MeshWeaver.Documentation/Data/Architecture/PartitionedPersistence.md | Updates persistence routing docs for Source/Test. |
| src/MeshWeaver.Documentation/Data/Architecture/MeshGraph.md | Updates examples to Source/ naming. |
| src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionSampleData.cs | Adds cession sample dataset for docs/demo. |
| src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionResultsArea.cs | Adds reactive charting layout area example. |
| src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionEngine.cs | Adds pure business logic sample for cession calculations. |
| src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionData.cs | Adds content models for cession example. |
| src/MeshWeaver.Data/Serialization/SyncStreamOptions.cs | Adds configurable heartbeat interval for sync streams. |
| src/MeshWeaver.Data/Serialization/JsonSynchronizationStream.cs | Implements resubscribe-on-owner-dispose logic. |
| src/MeshWeaver.Blazor/Pages/ApplicationPage.razor | Switches to NavigationStatus-driven progress/not-found/error UI. |
| src/MeshWeaver.Blazor/Components/NavigationProgressBar.razor.css | Adds styling for full-page vs compact overlay progress bar. |
| src/MeshWeaver.Blazor/Components/NavigationProgressBar.razor | Adds reusable “spinner + message” component. |
| src/MeshWeaver.Blazor/Components/MeshSearchView.razor.cs | Adds Category grouping fallback to NodeType. |
| src/MeshWeaver.Blazor/Components/LayoutAreaView.razor.cs | Adds stream lifecycle logging and additional diagnostics. |
| src/MeshWeaver.Blazor/Components/LayoutAreaView.razor | Surfaces compilation progress indicator before first stream emission. |
| src/MeshWeaver.Blazor/Components/CompileProgressIndicator.razor.css | Adds styling for compilation progress banner. |
| src/MeshWeaver.Blazor/Components/CompileProgressIndicator.razor | Adds polling UI component for active NodeType compilation. |
| src/MeshWeaver.Blazor.Portal/MeshWeaver.Blazor.Portal.csproj | Adds NU1510 suppression. |
| src/MeshWeaver.Blazor.AI/MeshWeaver.Blazor.AI.csproj | Adds NU1510 suppression. |
| src/MeshWeaver.Blazor.AI/McpMeshPlugin.cs | Adds Patch/Move/Copy MCP tools and improves tool descriptions. |
| src/MeshWeaver.AI/ThreadLayoutAreas.cs | Adds debug logging around streaming view emission. |
| src/MeshWeaver.AI/IconGenerator.cs | Adds default AI-backed IIconGenerator implementation. |
| src/MeshWeaver.AI/DelegationCompletedEvent.cs | Removes delegation tracker/event types. |
| src/MeshWeaver.AI/Data/Agent/Worker.md | Updates @/ link guidance (no raw HTML href with @/). |
| src/MeshWeaver.AI/Data/Agent/ToolsReference.md | Updates @/ link guidance and provides correct/incorrect table. |
| src/MeshWeaver.AI/Data/Agent/Orchestrator.md | Updates @/ link guidance for agent outputs. |
| src/MeshWeaver.AI/AIExtensions.cs | Removes old type registration; registers IIconGenerator. |
| memex/aspire/Memex.Portal.Distributed/Program.cs | Registers blob-backed NuGet package cache in distributed deployment. |
| memex/aspire/Memex.Portal.Distributed/Memex.Portal.Distributed.csproj | References MeshWeaver.NuGet.AzureBlob. |
| memex/aspire/Memex.Database.Migration/Program.cs | Adds source/test to reserved schema list. |
| memex/aspire/Memex.AppHost/Program.cs | Adds LinkedIn secret/env wiring + sets NUGET_PACKAGES cache dir. |
| memex/Memex.Portal.Shared/Social/SocialMediaUserMenuProvider.cs | Adds “Social Media” shortcut on a user’s own node (lazy hub creation). |
| memex/Memex.Portal.Shared/Social/ApiCredentialNodeType.cs | Adds NodeType for PlatformCredential stored under _ApiCredentials. |
| memex/Memex.Portal.Shared/Pages/Login.razor | Adds “Connect LinkedIn for publishing” CTA on login page. |
| memex/Memex.Portal.Shared/OrganizationNodeType.cs | Switches to default layout areas registration. |
| memex/Memex.Portal.Shared/MemexConfiguration.cs | Adds LinkedIn publisher wiring, @/ redirect middleware, and routes. |
| memex/Memex.Portal.Shared/Memex.Portal.Shared.csproj | References MeshWeaver.Social. |
| memex/Memex.Portal.Monolith/appsettings.Development.json | Enables debug logging for LayoutAreaView. |
| MeshWeaver.slnx | Adds new projects (NuGet, NuGet.AzureBlob, Social, new test projects). |
| Directory.Packages.props | Adds NuGet.* package versions for resolver implementation. |
| CLAUDE.md | Documents @/ local-only rule and href/URL restrictions. |
| (Various) samples/Graph/... | Adds/updates many sample NodeTypes and content under Source/ to reflect new conventions and demos. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…+ test helpers Recursive DeleteNodeRequest handled on a node's own hub was deadlocking: the final DeleteSelfFromStorage posted Ok and DisposeRequest from the dying hub, so the Ok raced callback disposal on the caller and was lost. Introduce CommitNodeDeletionMessage and forward the terminal commit (storage delete + reply + grain dispose) to the resolved mesh hub (walking ParentHub upward) — Sender becomes the stable mesh hub, FIFO on the caller's inbound queue guarantees Ok resolves the RegisterCallback before DisposeRequest arrives. Also addresses two Copilot review comments on PR #95: - FileSystemStorageAdapter.DeleteAsync empty-directory ascent is now concurrency- tolerant: wraps the enumerate + Directory.Delete in try/catch, swallowing the DirectoryNotFoundException race and breaking on IOException (non-empty / in-use). Required because FileSystemPersistenceService.MoveNodeAsync now parallelizes descendant deletes via Task.WhenAll. - PostStatsRefresherTest.WaitUntilAsync throws TimeoutException with a descriptive message instead of returning silently on deadline, so the test cannot green-tick a stats-refresh that never happened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@copilot resolve the merge conflicts in this pull request |
Resolved. The merge with Conflicts resolved:
|
Code review — recent stability batch
Manual review of the last ~20 commits since Correctness — should fix before merge1. ✅ foreach (var (k, v) in perParams)
{
var newKey = "@" + prefix + k.TrimStart('@');
renamedSql = renamedSql.Replace(k, newKey);
renamedParams[newKey] = v;
}
Fix: single regex pass keyed on 2. ✅ Fix: 3. ✅ Fix: parse every query in 4. ✅ Fix: Race / lifecycle hazards5. ✅ Fix: drop the time-based heuristic in favour of a structural one — skip recovery only when the thread is still an auto-execute candidate ( 6. ✅ 7. ✅ 8. ✅ Fix: pre-allocate the Style / consistency9. ✅ 10. ✅ 11. ✅ Fix: drop the per-query Limit injection. Limit is enforced post-union via ✅ Looks good (no action needed)
|
Code review — part 2: rest of the PR
Continuing review on the bulk of the PR (everything before the recent stability batch). Focused on the new projects ( Correctness — should fix before merge12. ✅ return _cache.GetOrAdd(key, _ => ResolveCoreAsync(requested, framework, ct));If Fix: evict faulted/cancelled tasks from the cache before returning. Also pass 13. ✅ Fix: switched to 14. ✅ Fix: post-hydration, the resolver opens the package folder via 15. ✅ Fix: defensive 16. ✅ Race / lifecycle hazards17. ✅ 18. ✅ 19. ✅ Fix: replaced with a single bounded Style / consistency20. ✅ Fix: register the publisher as a true singleton via 21. ✅ Fix: gate hosted-service registration on 22. ✅ 23. ✅ ✅ Looks good (no action needed)
Areas not covered in this reviewPersistence-service refactors ( |
Review fixes applied — all 23 items addressed5 commits, organised by batch. Locally committed, not pushed yet.
Verification
Notes
Ready to push when you want. |
|
Done — review item #14 is now closed in commit |
…fix DI lifetimes, redact PII, drop dynamic - ThreadExecution: collapse triple-stacked <summary> blocks on WatchForExecution and NotifyParentCompletion. Tooling kept the last one anyway; the dead scaffolding was just noise. - SocialExtensions: register LinkedInPublisher / XPublisher as TRUE singletons (factory-resolved with named HttpClient). The previous AddHttpClient<T>+AddSingleton<IPlatformPublisher> mix made the concrete type transient while the interface alias was singleton — direct vs via-interface resolution returned different instances. Also gate hosted-service registration on at least one platform being configured (the "all-or-nothing" comment was wrong; with zero platforms the four hosted services started anyway and faulted on first tick). - LinkedInPublisher: replace `(dynamic)media.shareMediaCategory` peek with two concrete payload shapes — typo turns into a compile error instead of a RuntimeBinderException. - LinkedIn / X publishers: cap error-body logs at 200 chars to bound PII exposure (the body can echo the user's post text on validation rejection). Full body still goes to PublishResult.Error for the caller. Addresses PR #95 review items #9, #20, #21, #22, #23. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… in-memory engines
PostgreSqlStorageAdapter.QueryNodesAsync(IReadOnlyList<ParsedQuery>):
- Replace order-dependent `string.Replace` parameter rename with a
single `Regex.Replace` keyed on @<name> word boundary that gates
on perParams.ContainsKey. Sequential Replace was mangling adjacent
tokens (renaming `@p` after `@p1` produced `@q0_q0_p1`) and could
clobber `@…` substrings inside string literals / JSONB paths.
- Switch from `UNION` to `UNION ALL` wrapped in
`SELECT DISTINCT ON (namespace, id) ... ORDER BY namespace, id, last_modified DESC`.
Plain UNION dedupes whole rows — two queries observing the same
node at slightly-different last_modified would BOTH appear in the
output. Path-keyed dedup (= MeshNode identity) with newest-wins
tie-break collapses them correctly.
PostgreSqlMeshQuery.ObserveQuery<T>:
- Parse EVERY query in request.EffectiveQueries and build per-query
(basePath, scope) filters; the change-notifier subscription
OR-joins them so multi-query observations get delta refreshes
triggered by ANY query's path/scope, not just query #0's. The
previous shape silently lost live updates from queries #1+.
PostgreSqlMeshQuery.QueryNodesUnionAsync + MeshQueryEngine:
- Drop the per-query `parsedList[0].Limit = request.Limit` injection.
Query #0 hit its limit before yielding the union's most relevant
rows, while queries #1+ contributed unbounded — making the result
iteration-order dependent. Limit is now enforced post-union via
MinLimit(request.Limit, firstParsed.Limit) so a request-level cap
can't be circumvented and an in-query `limit:N` still wins when
smaller.
- MeshQueryEngine: CollectMatchedAsync returns the LIST of every
query's basePath; the source:activity post-filter scans every
base path's descendants and unions activity-main-paths so
queries #1+ aren't filtered against query #0's subtree only.
Addresses PR #95 review items #1, #2, #3, #4, #11.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ThreadExecution stability fixes ThreadExecution.cs (already in commit 478fdaa — recapping here for the review-item index): - RecoverStaleExecutingThread: drop the 2-minute "fresh execution" window in favour of a structural check (skip when PendingUserMessage + ActiveMessageId are still set, i.e. the thread is an auto-execute candidate WatchForExecution will pick up). Closes the "long-running agent crashed at minute 5 → IsExecuting=true forever" gap; the time-based heuristic contradicted commit 6dc436b's "no time limits" stance. - Subject<StreamingSnapshot>: declare with `using var` so the Subject itself disposes alongside its subscription. Minor leak per execution previously. - HandleSubmitMessage: pre-allocate the per-round CancellationTokenSource and store it on the thread hub BEFORE posting SubmitMessageResponse — closes the race where an early Stop click between IsExecuting=true and ExecuteMessageAsync's `parentHub.Set(executionCts)` found a null CTS slot and silently no-op'd. ExecuteMessageAsync now reuses the pre-allocated CTS (with a fallback for the auto-execute path that bypasses HandleSubmitMessage). IsExecutingLifecycleTest.cs: - Migrate the response-text wait from text-pattern matching (skipping placeholders "Allocating agent..." etc.) to `ThreadMessage.CompletedAt is not null`, which ExecuteMessageAsync sets only on the terminal PushToResponseMessage call. Same pattern adopted in ChatHistoryTest in commit ab3af8b. - Add a regression assertion that final ThreadMessage.Status == Completed. The terminal-status guard in PushToResponseMessage prevents the late Sample(100ms)-flushed Streaming push from regressing the cell from Completed back to Streaming; this assertion catches any future regression of that guard. Addresses PR #95 review items #5, #6, #7, #8, #10. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…, parallelism, backoff)
NuGetAssemblyResolver:
- Evict faulted/cancelled tasks from the per-key cache before
returning. A transient feed failure (network, throttle, cancelled
in-flight resolve) used to poison the cache for the resolver's
lifetime — every subsequent call replayed the same exception.
- Pass CancellationToken.None to the shared core task so a single
caller's cancellation can't take down the resolution for
others; per-caller `ct` projects via `task.WaitAsync(ct)`.
- Switch DependencyBehavior from `Lowest` to `HighestMinor` so
`#r` directives pick up patch-level security fixes via
transitive dependencies without silently jumping major/minor.
- Document that hydrated cache content is trusted to match
(id, version) — flag for future content-hash verification if
cache poisoning becomes a concern.
LinkedInPublisher / XPublisher (LinkedIn already committed in batch A
for the dynamic+PII parts; this commit adds the 401 retry):
- SendWith401RetryAsync: on the FIRST 401 response from a publish,
force-refresh the token (zero ExpiresAt before EnsureFreshAsync)
and retry once. Closes the race where the access token's TTL
expired between EnsureFreshAsync and the actual API call.
PostStatsRefresher:
- Process due-refresh targets via Parallel.ForEachAsync bounded
by SocialOptions.StatsRefreshDegreeOfParallelism (default 8),
so a slow API + large refresh window can't let one tick
overshoot the next interval.
- Per-target failure backoff via a ConcurrentDictionary of
last-failure timestamps — targets that failed within
StatsRefreshFailureBackoff (default 15 min) skip the next tick.
Stops a degraded platform from generating thousands of repeat
warnings every cycle while the underlying issue is fixed.
Success clears the backoff entry.
SocialOptions: add StatsRefreshDegreeOfParallelism (8) and
StatsRefreshFailureBackoff (15 min) knobs.
Addresses PR #95 review items #12, #13, #14, #16, #17, #18.
(#15 XPublisher defensive parse + the LinkedIn dynamic / PII items
were already in commit 478fdaa.)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… file lock The MESHWEAVER_DISPOSE_TRACE=1 trace took a global lock per call (`File.AppendAllText` under `lock (DisposeTraceLogLock)`), serialising hub teardown under load when many hubs disposed concurrently. Replaced with a single bounded `Channel<string>` (capacity 4096, FullMode = DropWrite) drained by one writer task started in the type initialiser. Producers `TryWrite` non-blocking — if the disk is slow / locked, lines drop on full instead of putting back-pressure on dispose. Single-reader semantics avoid contention on the file handle. Addresses PR #95 review item #19. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the TODO from commit 512adb4. After a successful INuGetPackageCache.TryHydrateAsync, the resolver now opens the hydrated folder via PackageFolderReader and compares the package's own .nuspec-declared (id, version) against the expected (id, version). On mismatch the directory is purged and the resolver falls back to the feed. This catches the failure modes #14 was about: wrong package stored under right key (cross-tenant blob, accidental copy, drift after a manual edit). The .nuspec is the canonical NuGet source of truth, so a tampered cache entry can't fake the identity without rewriting the nuspec — which we'd then catch at hydration time. No INuGetPackageCache contract change; validation lives entirely in the resolver. Closes the last open item from PR #95 review (item #14). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@copilot resolve the merge conflicts in this pull request |
…nce-in-depth + explicit contract docs
Orleans grains are re-entrant; the IMeshNodeStreamCache singleton is hit
by many grains concurrently. Lock the contract down with three tests:
GetQuery_ManyConcurrentCallersSameId_AllSeeSameSnapshot
64 threads racing GetQuery(sameId, query). Asserts every subscriber
sees the same MeshNode snapshot — if CAS-loser observables had
leaked Connect (the AutoConnect(0) bug fixed in 04fae84), we'd
see divergent snapshots from racing initial queries.
GetQuery_ReturnsLiveUpdatesAfterRuntimeCreate
Eventual-consistency check: subscriber attaches before any writes,
then nodes are created at runtime. Both the held-open subscription
and a late-arriving subscriber must see the live state, not the
stale Initial Replay buffer.
GetQuery_ConcurrentDifferentIds_AllResolveIndependently
32 threads racing with distinct ids. Stresses the ImmutableDictionary
CAS retry loop with N keys hitting _queries simultaneously — every
caller's chain must converge.
Add .Synchronize() at the public surface of GetQuery for defence-in-depth:
ReplaySubject already serialises OnNext/Subscribe internally, but wrapping
the returned observable makes the single-threaded-callback contract
explicit at the cache's API.
Inline the thread-safety contract (creation, CAS, subscription, emission,
eventual consistency) as comments on _queries — future readers don't have
to know Rx internals to trust the cache is safe under fan-out.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rialises emissions ReplaySubject<T> (backing Replay(1)) is internally synchronised — OnNext + Subscribe coordinate via lock. Wrapping with .Synchronize() added a second gate that contended under concurrent subscriber load. Security.Test suite: 3:30 → 1:44 after this revert. The contract docs stay in place — readers don't have to know ReplaySubject's internal sync, the comment now points at it directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uery providers
Buffer(DefaultDebounceInterval=100ms) in PostgreSqlMeshQuery.ObserveQuery
and StorageAdapterMeshQueryProvider.ObserveQuery was the source of
order-dependent flakes:
T+0 Test commits CreateNode(AccessAssignment) → persistence.Write
→ adapter._changes Subject fires DataChangeNotification.
T+0 Notification lands in changeBuffer Subject.
T+10 Test calls hub.CheckPermission → cache.GetQuery first subscriber
→ AutoConnect(1) Connect → ObserveQuery's existing Replay(1)
buffer holds the PRE-WRITE snapshot.
T+10 Subscriber returns Permission.None — wrong.
T+100 Buffer flushes, RunQuery diffs, ProcessBatch emits Added →
Scan updates → Replay(1) caches new state — too late.
The 100ms debounce window IS the race. Subscribers attaching during it
see stale Replay(1).
Switch both providers to process every change immediately:
changeBuffer
.Select(n => RunQuery().Select(newResults => (batch=[n], newResults)))
.Concat()
.Subscribe(t => ProcessBatch(...))
Concat preserves the unit-of-work guarantee — next RunQuery doesn't start
until previous ProcessBatch completes — but the per-change RunQuery
means the Replay(1) buffer reflects every commit within milliseconds of
its persistence write, not 100 ms later.
Trade-off: throughput cost is one RunQuery per change instead of one
per batch. For prod load that's bounded by the connection pool; for
test correctness it eliminates the entire flake class.
Security.Test: 225/225 green locally at 2:04 (was 222-225 / 3 Menu flakes).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oseInChildren=true Commit 95f840f flipped ExposeInChildren default from true to false (to fix wire-serialisation drop on false values). The AddFileSystemContentCollection builder doesn't set ExposeInChildren on the config it produces, so the new default of false silently took effect — GetAllCollectionConfigs filters by ExposeInChildren, returns empty, and tests that list configs fail ("Expected configs to have an item matching c.Name == 'test-content' … but found 0"). Set ExposeInChildren = true on the config produced by this builder — these are user-facing filesystem collections and the whole point of registering them is to surface them to children. ContentService_ListsCollectionConfigs now passes in isolation; AI suite flake count drops as a result. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GrantSelfAdmin and GrantPlatformAdmin did a plain CreateNode, so a retried
onboarding (after any partial failure) dead-ended with "Node already exists:
{user}/_Access/{user}_Access". CreateUser already folds "already exists" into an
update; apply the same self-repairing Catch to both access grants so a leftover
_Access from a half-finished prior attempt is brought to the intended content
and onboarding completes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
TodoDataChangeWorkflowTest's SoftDelete/Restore/Query/AllTasksView tests each do multiple sequential .Within(60s) reads inside a 60s [Fact(Timeout)] cap. Each test gets its own mesh + per-test local data copy (mutation isolation), so the ACME/Project NodeType cold-compile cost isn't shared; under CI's 2-core runner the cumulative time exceeds the 60s method cap and the test is killed mid-flight. They pass 19/19 locally. The operations complete — the cap was just too tight for CI. Bump the 5 affected method timeouts to 120s (same rationale as the Planning/FutuRe cold-compile budgets). Not masking a hang: the work finishes, it just needs CI headroom. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ery tests" This reverts commit 04bdb63.
Both data-source synchronization subscriptions (unpartitioned + partitioned base in GenericUnpartitionedDataSource) call Synchronize(change) directly inside .Subscribe; if Synchronize throws for ONE change (a type-source update faulting under load) the exception propagates to onError and the subscription DIES. The data source then stops syncing — every query/catalog fed by it goes stale and query-based work parks forever. That is the data-layer 'observer dies' deadlock behind the CI-only query flakes (Acme TodoDataChangeWorkflowTest et al.), the same root as the thread-watcher deadlocks. Wrap Synchronize(change) in try/catch so a single bad change is logged and skipped while the stream stays alive and the next change still flows. No regression: TodoDataChangeWorkflowTest 19/19 local. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SynchronizationStream.UpdateStream dropped a Patch that arrived before the base Full during the subscribe handshake, trusting the owner's Full to carry the change. But that Full may have been computed BEFORE the change (producer updated in the subscribe→init window, or Full/Patch reordered on the wire) — so the change is LOST and the consumer sits on stale state forever. That is the client-side 'stream never emits' deadlock behind the CI-only flakes where a test's GetRemoteStream never sees the result (CreateThread, RapidSubmits, TodoDataChangeWorkflow query waits) — the same observer/missed-emission family as the server-side watcher deadlocks. Fix: request a fresh Full (RequestFreshSnapshot) instead of silently dropping, so the consumer gets the CURRENT state including the change. Flood-safe: gated by _resyncInFlight — exactly one resubscribe per gap, cleared when a Full re-establishes Current (the guard that originally tamed the TodoDataChangeWorkflow resync storm). No regression: Data.Test 195/195, Markdown.Collaboration 324/324. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tests run with --verbosity minimal, so the job log only showed per-project exit markers — the actual failed test names lived only in the .trx (behind the separate 'Test Results' check-run). Add a 'Summarize test failures' step that parses every .trx and emits each failed test as an ::error:: annotation + into the job step summary, so failures are visible directly on the job without digging. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DispatchAfterClaim's no-dispatch rollback blindly forced Status=Idle. The claim Status oscillates and the _Exec round watcher fires DispatchAfterClaim more than once per logical round; a duplicate fire reaches the rollback with dispatch==null AFTER the real commit already flipped StartingExecution→Executing and drained PendingUserMessages — and the blind Idle write UN-DID the running round, so the next watcher tick re-claimed the same pending into a fresh round: the re-dispatch loop (hundreds of response-cell creates, never settling) behind Resubmit_AfterExecution under the full Orleans sequence. Roll back ONLY a stuck StartingExecution claim, never an Executing round. Cuts the loop ~2x (243→106 creates) and fixes Delegation_NodeChanges in the 2-test sequence; a residual re-dispatch remains (atomic-claim-drain refactor next). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e admin grant, bidirectional email config Connection-pool storm fixes (atioz): - PostgreSqlPartitionStorageProvider.CreateAdapterForTable reuses the shared base data source instead of building + leaking a per-(schema,table) NpgsqlDataSource. - ReadConcurrencyGate (new): per-adapter read-concurrency cap (MaxReadConcurrency=16) so a synced-query fan-out can't drain the pool; writes stay ungated for headroom. - PostgreSqlStorageAdapter gates its 5 read paths through the shared gate. Onboarding: - UserOnboardingService.GrantPlatformAdmin writes the first-user grant at ROOT scope (namespace=_Access, MainNode="") — the only shape AdminMenuGate's root check reads — so the first user is recognised as platform admin and the Invitations tab shows. AI chat: - AgentChatClient.ShouldWatchOwnProviderPartition skips virtual (guest) identities so guests don't fan a per-session provider-partition query storm. Email (bidirectional): - helm config.yaml: Email__NoReplyAddress -> Email__MailboxAddress (the old key no longer binds EmailOptions.MailboxAddress) + inbound keys (InboundEnabled, WebhookBaseUrl, SubscriptionClientState). Tests: ReadConcurrencyGate unit tests, root-scope-admin-gate PG test, MaxPoolSize 4->16 across the partitioned-PG test classes. Also snapshots concurrent in-flight branch work: ThreadExecution/ThreadSubmission, the standard-page-layout CSS, and the Orleans node-change-propagation / resubmit-deadlock tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… GetStream/Update — typed Content, fail loud, no silent JsonElement The bare IMeshNodeStreamCache.GetStream(path) / Update(path, fn) overloads fed raw JsonElement Content to consumers, so `node.Content as MyType ?? new MyType()` silently returned null and overwrote real state with a default on the next write (the CheckInbox / AppendUserInput / streaming-status silent-reset bug class). Those overloads are removed; the typed overloads (… , JsonSerializerOptions) are now the only entry points. - Typed boundary deserializes Content first and THROWS (MeshNodeStreamException / the raw JsonException) on a wrong or unregistered $type — no swallow, no silent fallback. EnsureTypedContent + the cache's ConvertContentJsonElementToTyped agree. - Every caller now passes the in-scope hub's JsonSerializerOptions, or uses the typed handle workspace/hub.GetMeshNodeStream(path).Update(fn) which carries options: Graph layout areas (Approval/Comment/Notification/Settings/IconPicker, CompileActivity), AI ThreadExecution/HubThreadExtensions/Delegation, Blazor (Collaborative/Markdown/ Thumbnail/CompileProgress/ExportDocument/ThreadBubble), Portal (NotificationCenter/ ThreadChat), Hosting.Blazor NavigationService, Orleans MessageHubGrain, and tests. Email (bidirectional): global Email defaults + shared webhook SubscriptionClientState in helm values.yaml; atioz sends/receives as the shared mailbox atioz@systemorph.com with inbound enabled (values.atioz.yaml). One-time Entra Mail.ReadWrite + Exchange send-as grant still required for the inbound half (documented inline). Warnings: fixed CS1574/CS1734 broken crefs/paramrefs (incl. the removed-overload crefs), CS8604 null-arg, CA2017 log-template arity, xUnit1051 (TestContext.Current.CancellationToken). Build: 0 errors, 0 warnings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-free top-level autocomplete + email docs #16 — public.top_level_index MATERIALIZED VIEW over every partition's namespace='' root node (one row per partition), built by public.rebuild_top_level_index() from public.searchable_schemas. Powers top-level autocomplete from one small indexed relation instead of a cross-schema fan-out (the connection-pool storm). NOT a node copy; NOT rebuilt on the query hot path (DDL-per-query deadlocked — removed); re-materialized only at schema-init (PostgreSqlSchemaInitializer) and at deploy (SearchableSchemasUpdater). Test: TopLevelIndexTests (materializes partition roots only, excludes within-partition children, idempotent rebuild). Docs: SendingEmail / InvitationOnlyOnboarding / OnboardingNewEnvironment updated to Email__MailboxAddress (renamed from NoReplyAddress) + bidirectional (Mail.Send outbound, Mail.ReadWrite inbound). Also bundles concurrent in-flight branch work present in the tree: the TypeRegistry / MessageHubConfiguration default $type discriminator change (FullName -> simple Name, to fix cross-hub DeliveryFailure) and ThreadExecution / ThreadSubmission edits. Build: 0 errors, 0 warnings. PG tests (TopLevelIndex + PartitionLifecycle): 6/6 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (atioz DB-corruption root cause) PostgreSqlPathRoutingAdapter.ResolveState now rejects a first path segment that isn't a valid Postgres-identifier partition name (starts letter/digit, then [A-Za-z0-9._-], <=63 chars) via IsValidPartitionSegment, instead of lazily CREATE SCHEMA-ing it. Prod 2026-06-05: the atioz DB filled with garbage schemas from request URLs routed as mesh paths (login?error=auth_failed, search?q=agent&hq=scope%3adescendants) — corrupting the DB and crashlooping the migration. First tested increment of #15 (partition-abstraction rework). Test: PartitionSegmentValidationTest (18 cases, no DB) — rejects the real atioz garbage, accepts valid partition names, enforces the 63-char identifier cap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… fans out PostgreSqlPartitionedMeshQuery.Autocomplete now serves the TOP-LEVEL case (empty base or a "/name" prefix) from public.top_level_index via the new ICrossSchemaQueryProvider.AutocompleteTopLevelAsync — one small indexed matview read with a PG-side relevance score (exact > name-prefix > id-prefix > substring; ORDER BY score DESC, not alphabetical), access-filtered by partition_access. The within-partition case (concrete basePath) stays the per-schema scoped query. Autocomplete NEVER fans out across schemas — the connection-pool-storm fix for the autocomplete path. Increment 2 of #20. Test: AutocompleteTopLevel_ReturnsScoredMatches_FromMatview_NoFanOut. 21/21 PG index/guard/autocomplete tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…elevance ORDER BY on cross-schema search #19 (grain warming): SyncedQueryMeshNodes warms the per-node grain of each NEWLY-added result via the shared IMeshNodeStreamCache, so the GUI's subsequent databinding hits a warm cache instead of a cold DB read. Scoped to real-user queries (System framework walks + anonymous guests excluded → no load amplification); gated to new paths (a quiescing live query never re-warms). Reuses the cache — no new message type. #20 (general relevance): GenerateCrossSchemaSelectQuery ORDER BYs a PG-side hybrid score (exact name > name-prefix > id-prefix > substring > description) when a text search has no explicit OrderBy, so the LIMIT keeps the most-RELEVANT rows instead of arbitrary heap order (a relevance bug: a relevant row could fall outside the LIMIT and never reach the merge). Sort by score, not alphabetically; an explicit OrderBy supersedes. Tests: 46/46 PG (cross-schema search, satellite fan-out, multi-query union, top-level index + autocomplete, segment-guard) green — no regression. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…disposal Acme.Test bulk failed (UpdateNodeRequest@.../DefinePersona never replied, leaked-callback at dispose) only when the ShareMeshAcrossTests class (AcmeSearchTest) ran alongside others: its ServiceProvider is pinned in the static _sharedProviders for the whole testhost, so its mesh + hosted hubs + subscriptions stayed live and interfered with later classes' per-test meshes. Passes in isolation, hangs in bulk -- "we keep instances of the mesh somewhere". - MonolithMeshTestBase: add ShareMeshClusterEnabled kill-switch (false for now) gating the 3 shared-mesh use-sites. Every test now gets a fresh, per-test-disposed mesh; the ~60 ShareMeshAcrossTests overrides + the _sharedProviders dict remain but are inert. Reversible by flipping one property (then restore proper per-class lifetime via IClassFixture). - MeshNodeStreamCache: implement IDisposable and release ALL state when the silo/mesh goes down -- dispose the per-path _updateQueues (Concat echo-wait subscriptions + MemoryCache expiration timer that pinned the singleton) and clear _access, in addition to the existing hydration-sub teardown. Fires from both cacheHub disposal and DI container teardown (idempotent guard). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…outing + 42P01-tolerant reads Removes the partition existence-probe abstraction. PostgreSqlPathRoutingAdapter resolves schema = first path segment synchronously (no information_schema probe, no async cache); writes ensure the schema via public.ensure_partition_schema; reads tolerate an absent schema (per-schema PostgreSqlStorageAdapter catches Postgres 42P01 -> empty). Deletes PgPartitionCache + PgPartitionNotifyListener (pg_notify cache-invalidation). The _-prefix global satellites resolve via the provider's registered-partition map (seeded at boot), never lazily created. Keeps the URL-segment guard + NodeType-name guard. Build 0/0; PG suite 448/449 (the 1 failure, NotifyDedupTrigger, is a pre-existing pg_notify ordering flake — passes in isolation, bypasses the router, unrelated). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e leaks
Acme.Test bulk deterministically hung 5 TodoDataChangeWorkflowTest cases
(UpdateNodeRequest@.../DefinePersona never replied in bulk). MSG_TRACE showed the
owning hub processed the write in ~10ms, but UpdateNodeResponse arrived ~12s later
at runLevel=Quiescing: the response was gated on the debounced persistence flush
(MeshNodeTypeSource 200ms timer), which is thread-pool-starved during the test's
synchronous wait in bulk, so the write only completed when FlushOnDispose forced
the flush at teardown. ClrMD GC-root analysis ("who holds the references") was used
to find the disposal leaks.
- TodoDataChangeWorkflowTest: migrate the deprecated NodeFactory.UpdateNode
(UpdateNodeRequest) writes to the canonical stream.Update, which completes on the
in-memory workspace echo, not the debounced flush, so it never stalls. Merge only
the changed State (minimal patch, no read-back clobber) and confirm the apply via
a one-shot read poll (stream.Update is optimistic; the old request confirmed via
its response). 71/71 green, twice.
- MessageHub.Dispose: the 25s deadlock watchdog awaited an UNCANCELLED Task.Delay
whose TimerQueue-rooted continuation captured `this`, pinning the whole hub graph
for 25s after EVERY disposal (disposal completes in <1s, so the watchdog only ever
leaked). Cancel the delay on disposal completion; the safety net still fires on a
genuine deadlock.
- MeshNodeTypeSource: dispose the debounce timer synchronously and gate
ResetDebounceTimer on _disposed + RunLevel so a flush-echo UpdateImpl during
Quiescing can't re-arm a TimerQueue-rooted one-shot timer that pins the disposed
hub graph (source -> Workspace -> MessageHub).
- Add MeshHubDisposalLeakTest: ClrMD GC-root probe that fails only when a disposed
mesh hub is pinned by a static/timer/handle root; tolerates transient stack roots.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…leaks Captures the playbook from the Todo bulk-hang investigation: MESHWEAVER_MSG_TRACE histogram (count DISTINCT messages, not trace lines, to tell a normal finite shutdown cascade from a version-chase loop); reading a reply that lands at runLevel=Quiescing as a teardown-gated dependency; and the ClrMD GC-root probe (MeshHubDisposalLeakTest) for "who holds the references" — distinguishing a real static/TimerQueue/handle pin from a transient stack root, with a fix table. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… gate) Prep for retiring UpdateNodeRequest. Verified RLS IS enforced on the stream.Update / PatchDataRequest path (a no-Update-rights write is denied, node unchanged), but the optimistic emit (UpdateRemote deliberately does NOT await the owner — Orleans / thread-pool deadlock avoidance) SWALLOWED the denial so the caller saw a fake success. - MeshNodeStreamCache.Update: add a client-side CACHE-ONLY write gate mirroring the read gate — when the caller's effective permissions for the path are already cached (warm from a prior read, the realistic read-then-edit flow), require Permission.Update and throw UnauthorizedAccessException before the optimistic emit. Cache-only by design: a per-write GetPermissionRequest probe doubled write latency and timed out on cold owning hubs (Acme bulk 2.5m -> 5m); the owner still enforces RLS authoritatively, the denial just isn't surfaced on the Rx stream for a cold write. Extract the permission probe into ProbeEffectivePermissions, shared by both gates. - RlsIntegrationTests.StreamUpdate_WithoutUpdateRights_IsDeniedAndErrors: viewer reads then attempts stream.Update -> denied (node unchanged) AND surfaces the error. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prep for retiring UpdateNodeRequest. HandleUpdateNodeRequest stamped LastModifiedBy from the request's UpdatedBy; the stream.Update path stamped LastModified but NOT LastModifiedBy, so migrating writes off UpdateNodeRequest would drop the "who last modified" audit field. - UpdateRemote: alongside the existing LastModified auto-stamp, stamp LastModifiedBy from the caller's AUTHENTICATED identity (capturedContextAtEntry) when the lambda left it untouched — the same AccessContext stamped on the outgoing patch, so a client cannot forge a different author. CreatedBy/CreatedDate stay untouched (immutable, preserved through the patch). - MeshNodeAuditingTest: migrate UpdateNodeRequest_Preserves... to stream.Update as a distinct user; asserts CreatedBy/CreatedDate immutable + LastModifiedBy stamped from the caller. 3/3 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…view + unscoped query defer #16 matview: rebuild_top_level_index() now selects exactly the partition ROOT per schema (namespace='' AND id=<schema_name>) so path (a generated column = id when namespace='') is globally unique — the prior (namespace,id)/(path) UNIQUE index collided on non-root top-level nodes repeating ids across partitions. atioz migration completes; matview = one row per real partition. #20 (partial): StorageAdapterMeshQueryProvider defers UNSCOPED queries to the native partitioned provider's SQL fan-out (partitioned PG only, via StorageAdapterQueryProviderOptions). Removes the pedestrian's slow cross-partition ListChildPaths walk for the onboarding middleware's unscoped 'nodeType:User content.email:X' lookup (the 20s timeout). Scoped/satellite queries untouched (pedestrian is the real server for _Access reads) — 452/452 PG tests pass. Absent-partition scoped-walk slowness (_Access/User/_UserActivity) is a separate routing-layer fast-fail, tracked next. RoutingConvergenceTests: PG regression guards that ResolvePath/GetQuery converge to empty (no hang) for absent partitions + non-matches. Docs: CLAUDE.md + Deployment.md rewritten to the AKS image-update deploy (dotnet publish -t:PublishContainer → az aks command invoke set image/rollout); deploy.sh = first-time env setup only; bare aspire deploy = legacy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… stream.Update Completes retiring the verb-shaped UpdateNodeRequest/UpdateNodeResponse mutation API in favour of the canonical stream.Update. RLS + auditing were already moved onto the stream.Update/patch path (01182bd, 9e1e37b), so this removal is safe. - Delete UpdateNodeRequest + UpdateNodeResponse + NodeUpdateRejectionReason, the HandleUpdateNodeRequest handler + its registration + type-registry entries. - HandleCreateOrUpdateNodeRequest: apply the update branch via hub.GetMeshNodeStream(path).Update(_ => merged) instead of forwarding UpdateNodeRequest. - MeshService.UpdateNode / HubNodePersistence.UpdateNode: same IObservable<MeshNode> surface, now stream.Update internally (optimistic emit; owner re-validates RLS + stamps auditing). - RlsNodeValidator / PartitionWriteGuardValidator: drop the UpdateNodeRequest arm; Update permission for writes is enforced on the patch path (RlsDataValidator). Create/Delete unchanged. - Migrate tests (RlsIntegration, OrleansGetDataRequestPropagation, ContentPropertySync, MeshNodeTypeSource) and update 7 Architecture/DataMesh docs to the stream.Update path. Full solution builds 0 errors / 0 warnings. Affected suites green: RLS 22, content/ nodetype 12, auditing 3, create-or-update 3. NOTE: CLAUDE.md's Data Access Patterns table ("Update | UpdateNodeRequest") still needs the same update but is mid-edit by another agent on this branch — left for that change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eTable) + routing design + deploy-route docs Foundation for the partition-provisioning redesign (Doc/Architecture/PartitionStorageRouting.md): partitions matter only for (1) queries — fan to every adapter, absent partition → empty; (2) object→adapter routing — longest-prefix-match across adapters (no registry); create-of-partition-object → adapter makes the schema + creator Admin; partition absent → refuse the write (root-cause fix for lazy-schema corruption). The NodeType definition is the single source of truth, loaded on create. This commit adds the declarative config (additive, no behavior change yet): - NodeTypeDefinition.OwnsPartition (bool) + StorageTable (string?) — storage shape declared on the type definition. - Declared OwnsPartition=true on Space + User (own a partition); StorageTable=user_activities on UserActivity (own a table). Knowledge centralized on the NodeType definitions; the central StandardTableMappings/NodeTypeToSuffix dicts + _Thread/_Access path-suffix matching get retired in the follow-up behavior-change pass (create-path provisioning, remove lazy schema creation, prefix-match routing, query fan-to-all). Docs: split Deployment into DeploymentAKS.md (AKS cluster route) + DeploymentContainerApps.md (Aspire test/prod → ACA route) — both first-class, neither legacy — with Deployment.md as index; CLAUDE.md reframed accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…UpdateNodeRequest UpdateNodeRequest was deleted (3218ce6); the table row pointed agents at the removed API. Now points at workspace.GetMeshNodeStream(path).Update(...), matching the ABSOLUTE "GetMeshNodeStream().Update() is the ONLY mutation API" rule above. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nsPartition NodeTypes
The PG path router lazily CREATE SCHEMA'd ANY first path segment on write, so
NodeType names / reserved words / request-URL segments each spawned a ghost
schema (atioz: 45 ghosts → DB corruption). Schema creation is now gated to ONE path.
- OwnsPartitionProvisioningValidator: the single schema-creation trigger. Reads
NodeTypeDefinition.OwnsPartition (User, Space), requires top-level, provisions
the schema BEFORE the root write. Replaces SpaceTopLevelValidator (deleted) and
User-onboarding's lazy reliance. Registered centrally in AddRowLevelSecurity.
- IPartitionStorageProvider.EnsurePartitionProvisioned: reactive IObservable<Unit>
(was Task), promise-cache + IIoPool.Run on a per-adapter pg:{adapter} pool
(cap 1 = one Npgsql connection). NO Observable.FromAsync at any call site.
- PostgreSqlPathRoutingAdapter: deleted lazy EnsureSchemaForPartitionSync from
RouteWrite + CreateAdapterForTable. Unprovisioned write → 42P01 ("no partition,
no write"), never a ghost schema.
- DefaultPartitionProvider: cut Portal/Kernel/_Activity/_UserActivity/_Thread +
seed grants (kernel work is Activities; system gets Permission.All from the
evaluator fast-path). KEPT _Access — global/root-scope grants are load-bearing.
- Migration + test fixture eagerly create auth + system_access (V27 only renames
user→auth, a no-op on a fresh DB; router no longer lazy-creates).
- NavigationService: never track activity under the system identity (it was
writing a system-security ghost partition).
- Docs (CLAUDE.md, ControlledIoPooling.md, AsynchronousCalls.md): Observable.FromAsync
is NEVER tolerated — Postgres/storage carve-out rescinded. PartitionStorageRouting.md
marked implemented.
- Tests: new GhostSchemaInvariantTests; 5 lazy-create-reliant PG tests updated to
provision-first. PG suite 454/454 green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_nodes The pedestrian StorageAdapterMeshQueryProvider's ListChildPaths scope-walk was the 60-70s onboarding/storm stall. PostgreSqlPartitionedMeshQuery now OWNS scoped primary-content serving by delegating to a per-schema PostgreSqlMeshQuery over the CACHED adapter (live deltas, no walk): - PostgreSqlPathRoutingAdapter.GetSchemaAdapter / provider.GetSchemaAdapter expose the cached per-schema adapter (shared in-process Changes feed). - PostgreSqlPartitionedMeshQuery: scoped mesh_nodes Query/QueryAsync/Select/within- partition Autocomplete delegate to a per-schema PostgreSqlMeshQuery (one per cached adapter → live deltas). Gated by !NeedsFanOut. - Pedestrian DeferToNativeProvider: defers unscoped/wildcard (→ native cross-schema fan-out) and scoped mesh_nodes (→ delegate); KEEPS scoped-satellite. The per-schema delegate's satellite Query Initial under-returns pre-existing rows (the live-delta path works; Initial-with-preexisting is a follow-up). Satellite reads to an absent partition are now fast anyway (42P01-tolerant, post-ghost-fix), so not a storm path. Renamed DeferUnscopedAndSatelliteToNativeProvider -> DeferToNativeProvider. PG suite 454/454. Doc: PartitionStorageRouting.md updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, RLS denial, read-after-write The UpdateNodeRequest deletion (3218ce6) dropped three behaviours the synchronous handler provided. Restored them without reintroducing the verb-shaped request. Cat 1 — Update validation: - RlsNodeValidator: re-add the Update arm (SupportedOperations + Validate switch). - New IOwnerEnforcedNodeValidator marker on RlsNodeValidator + PartitionWriteGuardValidator so the client-side update pipeline skips them (RLS stays owner-authoritative, not re-checked on the caller's hub — that was the cache-only-gate flake). - NodeUpdatePipeline (MeshService.UpdateNode): existence check (→ InvalidOperationException "Node not found"), app-integrity INodeValidator(Update) run (→ UnauthorizedAccessException), version bump (Math.Max(existing,incoming)+1 — VersionWritingStorageAdapter dedupes same version, so without this history never records V2/V3), then stream.Update under the caller's identity (Observable.Using + SwitchAccessContext so the read continuation can't drop the AccessContext and let a viewer's write slip through un-denied). Cat 2 — RLS denial surfacing: - UpdateRemote: terminal emission is now driven by the owner's PatchDataResponse / DeliveryFailure (30s optimistic fallback) instead of an eager optimistic emit. RLS denial (DeliveryFailure Unauthorized) → UnauthorizedAccessException; deserialization/validation NodeError → MeshNodeStreamException. Deadlock-safe: emission fires from the reactive Observe callback, never a blocking bridge. Deleted the flaky cache-only write gate in MeshNodeStreamCache. Cat 3 — read-after-write: - New IPostCommitFlush hook: HandlePatchDataRequest chains its ack off a durable WriteAndPublishUpdated (persist + IMeshChangeFeed.Updated for workspace cache eviction) instead of the in-memory commit, so a subsequent Query / GetRemoteStream sees the write. Verified green: NodeOperations validators 6/6, Security 15/15 (McpUpdate, CacheIdentity, RlsNodeValidator), Hosting.Monolith freshness/cache/copy/move/resubscribe + cross-hub-persist repro, Content VersionHistory 5/5, Persistence ProjectViews, AI MeshPlugin FullCrud. Full solution builds 0 errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, not RlsDataValidator The deletion-era doc edits claimed the cross-hub stream.Update patch path "re-validates RLS via RlsDataValidator" and emitted optimistically with a cache-only write gate. Neither is true after the fix-forward: the owner enforces Permission.Update via the [RequiresPermission(Update)] pipeline, and UpdateRemote drives the caller's terminal emission off the owner's response (surfacing denial as UnauthorizedAccessException). Also documents that app-integrity INodeValidators run client-side while IOwnerEnforcedNodeValidator (RLS/partition) ones are skipped there. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_access; V05 reads the matview not the dead `user` schema A truly-fresh atioz (admin/db_version wiped) ran the legacy `user`-schema repair chain (V05+) and aborted at V05 with `42P01: relation "user".mesh_nodes does not exist`. Root cause: 0ceba04 added ensure_partition_schema('auth')/('system_access') to SchemaInitialization, creating auth.mesh_nodes/system_access.mesh_nodes BEFORE DetectFreshDbAsync — so the fresh DB looked non-fresh and MigrationRunner ran the legacy chain instead of fast-forwarding past it. - SchemaInitialization: remove the eager auth/system_access creation (the portal's PostgreSqlPartitionSubscriptionHostedService provisions them at boot, before any user write). Harden DetectFreshDbAsync to exclude framework schemas (admin/auth/system_*/ portal/kernel) so they can never make a fresh DB look non-fresh. - V05: source Users from the central index (public.top_level_index matview — Users ARE partition roots), write the self-Admin grant into the user's OWN partition's `access` table at {id}/_Access (smallint state=2=Active). No `user` schema reference. - Test: MigrationUserBackfillFromIndexTests pins the matview-sourced backfill + no `user`. - Doc: PartitionStorageRouting.md — auth/system_access provisioned at portal boot (not the migration); fresh-DB fast-forward note (the other legacy `user` migrations are skipped on fresh DBs and only ran on legacy incremental DBs where `user` existed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ot the workspace-less _Exec hub
ExecuteMessageAsync read/wrote two MeshNodes via hub.GetMeshNodeStream(...)
where `hub` is the _Exec hosted hub. That extension calls hub.GetWorkspace(),
but _Exec is created with no AddData, so it threw:
InvalidOperationException: Configuration of message hub is inconsistent:
AddData was not called.
at Workspace..ctor → WorkspaceExtensions.GetWorkspace
→ MeshNodeStreamExtensions.GetMeshNodeStream(IMessageHub, String)
at ThreadExecution.ExecuteMessageAsync
The throw escaped on the WhenInitialized onNext path (not the Rx error
channel), so the init-stall onError never fired and the thread sat Executing
forever — every round wedged, presenting as a timeout/"deadlock". Since
PushToResponseMessage runs on every round, this broke every thread submission:
all of Threading.Test, AI.Test, Security.Test, and the Orleans delegation
tests timed out.
Regression from a4df4fc, which mechanically swapped cache.GetStream/Update
(a process-wide singleton, no workspace needed) for hub.GetMeshNodeStream.
Fix: route both sites through parentHub (the thread hub, which owns the
workspace and resolves the identical process-wide IMeshNodeStreamCache), so
the cross-hub patch still flows through the same shared handle the GUI reads.
Verified: Threading.Test 114/115 (1 skipped; the lone SubThreadHangRepro
sequencing flake passes in isolation), AI.Test + Security.Test thread tests
green after rebuild.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… synchronous submit failures
Onboarding submit hung on a fresh atioz: HandleSubmit's Observable.Using factory called
AccessService.ImpersonateAsHub(portal/{user}), and AccessContext rejects a hub-shaped
principal ("hub-shaped principal must never happen") — thrown SYNCHRONOUSLY during Subscribe,
bypassing the Rx onError, escaping HandleSubmit unhandled, leaving the form stuck at isSaving
with no message.
- Onboarding.razor: ImpersonateAsHub(PortalApplication.Hub) -> ImpersonateAsSystem(). Onboarding
creates the user's own partition root + self/platform grants — infrastructure writes the
not-yet-onboarded user can't authorize (canonical ImpersonateAsSystem case, like
SpacePostCreationHandler / OwnsPartitionProvisioningValidator).
- Onboarding.razor: wrap the Subscribe in try/catch -> OnOnboardingFailed, so a synchronous
subscribe failure surfaces in the error bar instead of silently hanging the form (the GUI
must never swallow an onboarding failure).
- StorageAdapterMeshQueryProvider: fix the stale DefersToNativeProvider XML doc (cref +
description) left from the DeferUnscopedAndSatellite -> DeferToNativeProvider rename.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
77 commits of long-running work on
bug_fix— grouped by theme:MeshWeaver.Social+ LinkedIn publisher + scheduled publishing pipeline (engine/queue/stats), LinkedIn OAuth connect + past-post ingest in Memex portal, per-user linked-account menu items.#r "nuget:Pkg, Version"at the top of_Source/*.csresolves via public NuGet.Protocol without an SDK on the container. Same resolver serves interactive markdown code cells.FileSystemPersistenceService.MoveNodeAsyncruns per-descendantWriteAsync/DeleteAsyncthroughTask.WhenAll; newMeshOperationOptions(defaultTimeout = 30s) +WithMeshOperationTimeout(TimeSpan)override;HandleMoveNodeRequestchains.Timeout()on the persistence Observable so a stuck adapter can't hang the caller. Prod repro: DAV2026 subtree move that took 240 s and killed the MCP session — now bounded.CompilationCacheService,_Source/edit re-invalidates owning NodeType, cross-silo broadcast viaMeshChangeFeed, grain-dispose on node delete, live "Compiling … (Ns)" progress inLayoutAreaView.Category(falls back toNodeType), reactive Children catalog, self-as-default create location for non-NodeType nodes, sample orgs →Markdownfor search visibility.MeshChangeFeedevents, resubscribe on owner dispose,DeleteLayoutAreaemits a placeholder immediately and times out slow streams.IAsyncEnumerableaggregator fixes (satellite-safeGatherInputsAsync), xunit methodTimeout 30 s → 60 s, Anthropic Opus bump, icon generator, etc.New test suites (selected)
test/MeshWeaver.Persistence.Test/MoveNodeRecursiveTest.cs— 10 tests: recursion, parallelism, source missing / target exists / storage throws / cancellation (all must not hang), RxTimeout()contract, default-30s config.test/MeshWeaver.Social.Test/*—InMemoryPublishQueueTest,LinkedInPublisherEngagementTest,PostStatsRefresherTest,ScheduledPostPublisherTest,FakePublisher.test/MeshWeaver.Persistence.Test/WorkspaceCacheEvictionTest.cs,ResubscribeOnOwnerDisposeTest.cs,DeleteLayoutAreaIntegrationTest.cs.test/MeshWeaver.Markdown.Test/PathUtilsTest.cs,test/MeshWeaver.MathDemo.Test/MatrixViewsTest.cs.Contributors
dist/cleanup, fix: sample orgs invisible in search due to wrong NodeType #94 sample-org search-visibility fixUpstream already merged into this branch
refactor: reactive persistence — IMeshStorage writes return IObservable(merged)Test plan
dotnet buildsucceedsdotnet test test/MeshWeaver.Persistence.Test --filter MoveNodeRecursiveTest— 10/10 green (~8 s)dotnet test test/MeshWeaver.Hosting.Monolith.Test --filter MoveNodeAsync— 5/5 green (regression guard)dotnet test test/MeshWeaver.Social.Test— publish queue / scheduling / stats green_Source/*.csusing#r "nuget:MathNet.Numerics, 5.0.0"— compiles & renders (cold + warm cache)/social/connect/linkedin→ profile linked; menu shows connected accountScheduledPostPublisher→ LinkedIn publisher posts;PostStatsRefresherpulls stats🤖 Generated with Claude Code