Normalize served prev_action_chunk + reserved-key sample_kwargs passthrough#22
Open
jiabinq wants to merge 1 commit intoOpenDriveLab:mainfrom
Open
Normalize served prev_action_chunk + reserved-key sample_kwargs passthrough#22jiabinq wants to merge 1 commit intoOpenDriveLab:mainfrom
jiabinq wants to merge 1 commit intoOpenDriveLab:mainfrom
Conversation
…hrough Two related fixes to served Policy.infer: 1. RTC prev_action_chunk is now normalized to model space before reaching sample_actions. Pi0RTC.sample_actions() consumes prev_action_chunk in model space (post-Normalize), but Policy.infer was forwarding obs["prev_action_chunk"] from the wire raw. Agilex inference clients send a raw deploy-space slice of their execution buffer, so the guidance term was operating on un-normalized inputs — a silent train-deploy contract break (masked because Agilex action norm-stats are close to unit-variance, so the magnitude error is small). The fix adds a _normalize_and_pad_prev_chunk helper that delegates to the same transforms.Normalize the serving pipeline uses (so use_quantile_norm is honored), pads to action_horizon, and is wired from the loaded checkpoint via three new optional Policy params (norm_stats, use_quantile_norm, action_horizon). policy_config wires them automatically — call sites unchanged. Also guards against silent d=0 cheap-path activation when a client sends prev_action_chunk without inference_delay. 2. Reserved-key obs["_sample_kwargs"] allowlist for transport-layer sample_kwargs overrides (currently: noise). Previously the websocket protocol dropped the noise= kwarg — making deterministic served eval impossible. The reserved-key namespace (leading underscore) avoids collision with future models that legitimately use observation field names like "noise". Explicit noise= kwarg (in-process callers) takes precedence. Both fixes are backward-compatible: existing callers see no behavior change. Existing RTC clients that previously sent raw deploy-space chunks will now receive the correct normalized chunk — this is the bug fix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two related fixes to served
Policy.infer. Both are backward-compatible.Summary
1. RTC
prev_action_chunkis normalized to model space before reachingsample_actions.Pi0RTC.sample_actions(...)consumesprev_action_chunkin model space (post-Normalize), butPolicy.inferwas forwardingobs["prev_action_chunk"]from the wire raw. The Agilex inference clients undertrain_deploy_alignment/inference/agilex/send a raw deploy-space slice of their execution buffer, so the guidance term operates on un-normalized inputs — a silent train-deploy contract break. The bug is masked because Agilex action norm-stats are close to unit-variance, so the magnitude error in the guidance term is small, but the contract is broken.The fix adds
_normalize_and_pad_prev_chunkthat delegates to the sametransforms.Normalizeinstance the serving pipeline uses (souse_quantile_normis honored) and pads to the model'saction_horizon. Wired from the loaded checkpoint via three new optionalPolicyconstructor params (norm_stats,use_quantile_norm,action_horizon);policy_config.create_trained_policysets them automatically — call sites don't change.Also guards against silent
d=0cheap-path activation when a client sendsprev_action_chunkwithoutinference_delay(would otherwise run the eager loop with no prefix conditioning).2. Reserved-key
obs["_sample_kwargs"]allowlist for transport-layersample_kwargsoverrides (currently:noise).Policy.infer(obs, *, noise=...)acceptsnoisefor in-process callers, but the websocket protocol drops it —WebsocketPolicyServer._handlercallsself._policy.infer(obs)only. This makes deterministic evaluation of a served checkpoint impossible.The fix extracts an optional
obs["_sample_kwargs"]["noise"]into thenoisekwarg path. The reserved-key namespace (leading underscore) avoids collision with any future model that legitimately uses an observation field namednoise. The explicitnoise=kwarg takes precedence over the obs-supplied noise (in-process callers behave unchanged).Backward compatibility
_sample_kwargssee no behavior change.Policy(...)callers see no behavior change — the new constructor params are optional with safe defaults.prev_action_chunkwill now receive correctly-normalized chunks. This is the bug fix. No client API change required.Test plan
obs["_sample_kwargs"]["noise"] = <fixed>and verify the returned actions are bit-identical.Policy.infercall withprev_action_chunkagainst a directNormalize({"actions": stats})({"actions": raw})["actions"]and confirm the values forwarded intosample_kwargsmatch.Audit context
Found during a downstream parity audit of OpenDriveLab/kai0
9d93078deploy stack. Audit deliverables (downstream fork):notes/awbc_inference_dagger_parity_gaps.md,reference/awbc_inference_dagger_upstream_review.md.