Feat/nemo rl rlix f5 f6 by TianyeGGBond · Pull Request #2 · rlops/RL

TianyeGGBond · 2026-04-25T03:55:18Z

feat(rlix): wire F5/F6 scheduler hooks and vLLM weight update receiver
正文

Summary

rlix_hooks.py (new): defines RLixHooksProtocol + NoOpRLixHooks as the seam between NeMo RL and RLix. NeMo RL never imports from the rlix package directly — the real implementation is injected at runtime by NemoRLRLixHooks (rlix repo).
grpo.py: wires F5/F6 hooks into async_grpo_train via an optional rlix_hooks parameter. Adds DO_TIME_SHARING flag (controlled by RLIX_CONTROL_PLANE=rlix) to skip standalone refit/prepare paths that conflict with scheduler-driven sleep/wake.
vllm_backend.py: adds RLix weight update receiver methods to VllmInternalWorkerExtension — setup_collective_group, update_parameter_in_bucket, broadcast_parameter, destroy_collective_group, finalize_weight_update, verify_model.
vllm_generation.py: adds get_model_update_receiver (exposes worker surface for selective sync) and finalize_weight_update (dispatches post-load hooks to selected DP ranks after bucket sync).
vllm_worker.py / vllm_worker_async.py: adds rlix_model_update_rpc dispatcher that forwards RLix weight-update method calls to vLLM internal workers via collective_rpc.

How it fits together

async_grpo_train
hooks.before_training(step) ← F5: blocks on scheduler GPU grant
policy.train()
hooks.after_training(step) ← F5: releases actor_train GPUs
└─ scheduler triggers resize_infer(add=overlap_ranks)
└─ _expand_workers (rlix repo)
├─ wake_up_partial
├─ NemoRLModelUpdateService.sync_selected_workers
│ └─ setup_collective_group → broadcast_parameter → finalize_weight_update
├─ set_weight_version (collector)
└─ activate_dp_ranks (routing on)

Standalone mode (RLIX_CONTROL_PLANE unset): NoOpRLixHooks is used, all hook calls are no-ops, refit/prepare paths are unchanged.

Pending (follow-up features)

TODO F4: policy.build_cpu_bucket_cache(step) before after_training
TODO F11: policy.offload_training_gpu() + destroy_nccl_groups() before after_training

Test plan

Standalone GRPO training unaffected: RLIX_CONTROL_PLANE unset → NoOpRLixHooks, refit path unchanged
DO_TIME_SHARING=True: initial prepare_for_generation / refit skipped; before_training / after_training called each step
rlix_model_update_rpc dispatches correctly to sync/async worker variants
finalize_weight_update on VllmGeneration dispatches only to requested DP ranks
setup_collective_group early-returns True for ranks not in comm_plan

Add rlix_hooks.py: RLixHooksProtocol (typing_extensions Protocol) + NoOpRLixHooks default for standalone mode. Seam file keeps NeMo RL free of direct rlix package imports. Modify async_grpo_train: - rlix_hooks parameter injected by NemoRLRLixHooks from pipeline actor - DO_TIME_SHARING flag from RLIX_CONTROL_PLANE env var - before_training(step): blocks on scheduler GPU grant before lp_inference - after_training(step): notifies scheduler release; replaces refit in RLix mode (weight sync + version update done atomically in _expand_workers, F6) - on_trajectory_collector_created: registers collector handle so _expand_workers can call set_weight_version before activating dp rank routing - Initial refit and prepare_for_generation skipped when DO_TIME_SHARING=True TODO placeholders in after_training branch: F4: policy.build_cpu_bucket_cache(step) F11: policy.offload_training_gpu() + policy.destroy_nccl_groups() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

TianyeGGBond and others added 3 commits April 24, 2026 20:35

debug(grpo): add F5/F6 trace prints to verify hook wiring

9f0f23e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Expose RLix model update hooks for NeMo vLLM

34e19e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/nemo rl rlix f5 f6#2

Feat/nemo rl rlix f5 f6#2
TianyeGGBond wants to merge 3 commits intorlops:mainfrom
TianyeGGBond:feat/nemo-rl-rlix-f5-f6

TianyeGGBond commented Apr 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TianyeGGBond commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it fits together

Pending (follow-up features)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

TianyeGGBond commented Apr 25, 2026 •

edited

Loading