Skip to content

feat: Phase 3d+3e — ParquetMergePipeline supervisor + publisher feedback#6354

Merged
g-talbot merged 4 commits intomainfrom
gtt/parquet-merge-pipeline-3de
May 1, 2026
Merged

feat: Phase 3d+3e — ParquetMergePipeline supervisor + publisher feedback#6354
g-talbot merged 4 commits intomainfrom
gtt/parquet-merge-pipeline-3de

Conversation

@g-talbot
Copy link
Copy Markdown
Contributor

@g-talbot g-talbot commented Apr 29, 2026

Summary

Stacked on #6358 (Phase 3c). Adds the supervisor and feedback loop that make the merge pipeline runnable.

  • ParquetMergePipeline supervisor: spawns all merge actors (publisher → sequencer → uploader → executor → downloader → planner), periodic health-check supervision loop, respawn on failure with backoff, graceful shutdown via FinishPendingMergesAndShutdownPipeline that disconnects feedback and runs finalize policy for cold windows.
  • Publisher feedback: adds parquet_merge_planner_mailbox_opt to Publisher (feature-gated cfg(feature = "metrics")). After successful ParquetSplitsUpdate publish of new ingested splits, sends ParquetNewSplits to the planner. Merge outputs are not fed back (guards against infinite loops).
  • DisconnectMergePlanner extended to clear both Tantivy and Parquet planner mailboxes.

Note for reviewers: spawn concurrency semaphore

The Parquet merge pipeline has its own SPAWN_PIPELINE_SEMAPHORE (limit 10), separate from the Tantivy merge pipeline's semaphore. This means up to 10 Parquet pipelines and 10 Tantivy pipelines could be spawning simultaneously (20 total), each hitting the metastore. This may be desirable for isolation (one pipeline type can't starve the other's spawns), but reviewers should decide if a shared semaphore would be better for metastore protection. Note that actual merge execution concurrency is already shared via MergeSchedulerService's single semaphore.

Test plan

  • 3 pipeline tests (spawn+supervise, shutdown drain, initial splits)
  • 3 existing publisher tests pass (no regression)
  • Compiles with and without metrics feature
  • cargo clippy clean, license headers OK

🤖 Generated with Claude Code

@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline branch from ceba410 to e96a920 Compare April 29, 2026 14:06
@g-talbot g-talbot changed the base branch from gtt/parquet-merge-pipeline to gtt/parquet-merge-pipeline-3c April 29, 2026 14:07
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from ceba410 to 5937440 Compare April 29, 2026 15:31
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from d82d72d to 92cb5ed Compare April 29, 2026 15:31
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from 5937440 to 0b1c9cc Compare April 29, 2026 18:10
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from 92cb5ed to b6f8bcc Compare April 29, 2026 18:11
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from 0b1c9cc to 66b97e0 Compare April 29, 2026 18:16
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from b6f8bcc to d296774 Compare April 29, 2026 18:16
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from 66b97e0 to 0f051bc Compare April 29, 2026 18:24
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from d296774 to ededb89 Compare April 29, 2026 18:24
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from 0f051bc to 16b46d7 Compare April 29, 2026 18:40
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from ededb89 to 761b379 Compare April 29, 2026 18:42
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from 16b46d7 to 8e19b6b Compare April 29, 2026 18:51
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from 761b379 to 9b3769e Compare April 29, 2026 18:51
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from 8e19b6b to f32bd64 Compare April 29, 2026 19:05
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from 9b3769e to 927dc9f Compare April 29, 2026 19:05
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from f32bd64 to de17c0e Compare April 29, 2026 20:54
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from 927dc9f to c51a84b Compare April 29, 2026 20:54
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from de17c0e to 1f6512e Compare April 30, 2026 02:30
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from c51a84b to 804e384 Compare April 30, 2026 02:30
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from 1f6512e to f2c4a8a Compare April 30, 2026 13:42
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from 804e384 to 55d8ff1 Compare April 30, 2026 13:42
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from f2c4a8a to 467f6fb Compare April 30, 2026 13:49
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from 55d8ff1 to d360c80 Compare April 30, 2026 13:49
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from 467f6fb to b6ca9bf Compare April 30, 2026 14:47
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from d360c80 to ab8fb10 Compare April 30, 2026 14:47
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from b6ca9bf to be60cf6 Compare April 30, 2026 16:41
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from ab8fb10 to 93fde45 Compare April 30, 2026 16:41
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from be60cf6 to bb6cd8b Compare April 30, 2026 16:52
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from 93fde45 to 9ad89a5 Compare April 30, 2026 16:52
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from bb6cd8b to f34bd04 Compare April 30, 2026 20:25
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from 9ad89a5 to a691c64 Compare April 30, 2026 20:25
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from f34bd04 to 5c68071 Compare April 30, 2026 20:41
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch 4 times, most recently from 752de8f to 3a7ad73 Compare May 1, 2026 13:24
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from 5c68071 to daacf89 Compare May 1, 2026 14:56
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from 2268720 to c2bdc90 Compare May 1, 2026 14:56
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from daacf89 to 2d82e7e Compare May 1, 2026 16:09
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch 4 times, most recently from f052239 to 768e966 Compare May 1, 2026 17:47
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3c branch from ac2ae00 to 706e1dd Compare May 1, 2026 17:52
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from 768e966 to 9d28f36 Compare May 1, 2026 17:52
Base automatically changed from gtt/parquet-merge-pipeline-3c to gtt/parquet-merge-pipeline-3b May 1, 2026 17:56
Copy link
Copy Markdown
Contributor

@mattmkim mattmkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMing to unblock, but some comments

Comment thread quickwit/quickwit-indexing/src/actors/metrics_pipeline/publisher_impl.rs Outdated
Base automatically changed from gtt/parquet-merge-pipeline-3b to main May 1, 2026 18:12
g-talbot and others added 3 commits May 1, 2026 14:51
…se 3d+3e)

Phase 3 pipeline integration, combined supervisor and feedback PR:

- ParquetMergePipeline supervisor: spawns all merge actors (publisher →
  sequencer → uploader → executor → downloader → planner), health-checks
  with periodic supervision loop, respawn on failure with backoff,
  graceful shutdown via FinishPendingMergesAndShutdownPipeline that
  disconnects feedback and runs finalize policy. 3 tests.

- Publisher feedback: add parquet_merge_planner_mailbox_opt to Publisher
  (feature-gated behind cfg(feature = "metrics")). After successful
  ParquetSplitsUpdate publish of new ingested splits, sends ParquetNewSplits
  to the planner. Merge outputs (non-empty replaced_split_ids) are not
  fed back to avoid infinite loops.

- DisconnectMergePlanner extended to clear both Tantivy and Parquet planner
  mailboxes, supporting shutdown drain for both pipeline types.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline-3de branch from 9d28f36 to 64bb5e8 Compare May 1, 2026 18:51
The guard that filtered out merge outputs (non-empty replaced_split_ids)
from the feedback loop was incorrect. Tantivy feeds ALL new splits to
the merge planner — both ingest and merge outputs. Infinite loops are
prevented by the merge policy's maturity checks (max_merge_ops, size,
maturation_period), not by the publisher.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@g-talbot g-talbot merged commit ebc1b66 into main May 1, 2026
9 checks passed
@g-talbot g-talbot deleted the gtt/parquet-merge-pipeline-3de branch May 1, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants