feat: Phase 3d+3e — ParquetMergePipeline supervisor + publisher feedback#6354
Merged
feat: Phase 3d+3e — ParquetMergePipeline supervisor + publisher feedback#6354
Conversation
4 tasks
ceba410 to
e96a920
Compare
ceba410 to
5937440
Compare
d82d72d to
92cb5ed
Compare
5937440 to
0b1c9cc
Compare
92cb5ed to
b6f8bcc
Compare
0b1c9cc to
66b97e0
Compare
b6f8bcc to
d296774
Compare
66b97e0 to
0f051bc
Compare
d296774 to
ededb89
Compare
0f051bc to
16b46d7
Compare
ededb89 to
761b379
Compare
16b46d7 to
8e19b6b
Compare
761b379 to
9b3769e
Compare
8e19b6b to
f32bd64
Compare
9b3769e to
927dc9f
Compare
f32bd64 to
de17c0e
Compare
927dc9f to
c51a84b
Compare
de17c0e to
1f6512e
Compare
c51a84b to
804e384
Compare
1f6512e to
f2c4a8a
Compare
804e384 to
55d8ff1
Compare
f2c4a8a to
467f6fb
Compare
55d8ff1 to
d360c80
Compare
467f6fb to
b6ca9bf
Compare
d360c80 to
ab8fb10
Compare
b6ca9bf to
be60cf6
Compare
ab8fb10 to
93fde45
Compare
be60cf6 to
bb6cd8b
Compare
93fde45 to
9ad89a5
Compare
bb6cd8b to
f34bd04
Compare
9ad89a5 to
a691c64
Compare
f34bd04 to
5c68071
Compare
752de8f to
3a7ad73
Compare
5c68071 to
daacf89
Compare
2268720 to
c2bdc90
Compare
daacf89 to
2d82e7e
Compare
f052239 to
768e966
Compare
ac2ae00 to
706e1dd
Compare
768e966 to
9d28f36
Compare
Base automatically changed from
gtt/parquet-merge-pipeline-3c
to
gtt/parquet-merge-pipeline-3b
May 1, 2026 17:56
mattmkim
approved these changes
May 1, 2026
Contributor
mattmkim
left a comment
There was a problem hiding this comment.
LGTMing to unblock, but some comments
…se 3d+3e) Phase 3 pipeline integration, combined supervisor and feedback PR: - ParquetMergePipeline supervisor: spawns all merge actors (publisher → sequencer → uploader → executor → downloader → planner), health-checks with periodic supervision loop, respawn on failure with backoff, graceful shutdown via FinishPendingMergesAndShutdownPipeline that disconnects feedback and runs finalize policy. 3 tests. - Publisher feedback: add parquet_merge_planner_mailbox_opt to Publisher (feature-gated behind cfg(feature = "metrics")). After successful ParquetSplitsUpdate publish of new ingested splits, sends ParquetNewSplits to the planner. Merge outputs (non-empty replaced_split_ids) are not fed back to avoid infinite loops. - DisconnectMergePlanner extended to clear both Tantivy and Parquet planner mailboxes, supporting shutdown drain for both pipeline types. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9d28f36 to
64bb5e8
Compare
The guard that filtered out merge outputs (non-empty replaced_split_ids) from the feedback loop was incorrect. Tantivy feeds ALL new splits to the merge planner — both ingest and merge outputs. Infinite loops are prevented by the merge policy's maturity checks (max_merge_ops, size, maturation_period), not by the publisher. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on #6358 (Phase 3c). Adds the supervisor and feedback loop that make the merge pipeline runnable.
FinishPendingMergesAndShutdownPipelinethat disconnects feedback and runs finalize policy for cold windows.parquet_merge_planner_mailbox_opttoPublisher(feature-gatedcfg(feature = "metrics")). After successfulParquetSplitsUpdatepublish of new ingested splits, sendsParquetNewSplitsto the planner. Merge outputs are not fed back (guards against infinite loops).Note for reviewers: spawn concurrency semaphore
The Parquet merge pipeline has its own
SPAWN_PIPELINE_SEMAPHORE(limit 10), separate from the Tantivy merge pipeline's semaphore. This means up to 10 Parquet pipelines and 10 Tantivy pipelines could be spawning simultaneously (20 total), each hitting the metastore. This may be desirable for isolation (one pipeline type can't starve the other's spawns), but reviewers should decide if a shared semaphore would be better for metastore protection. Note that actual merge execution concurrency is already shared viaMergeSchedulerService's single semaphore.Test plan
metricsfeaturecargo clippyclean, license headers OK🤖 Generated with Claude Code