[RFC] add Environment dataset (taskset) RFC by burtenshaw · Pull Request #727 · huggingface/OpenEnv

burtenshaw · 2026-05-21T13:28:24Z

This PR adds RFC 006 for Hugging Face RL environment datasets, documenting dataset-root environment.yaml declarations, AutoEnv handling for hf://datasets references, and dataset-bound environment behavior. It also updates the RFC index so the proposal is discoverable.\n\nValidation: git diff --check and git diff --cached --check.

Darktex

Note: This is an automated review by Claude Code, not a human review.

Alignment Review — PR #727

Tier 1: Spec-Level Issues

reset() signature mismatch with cursor: The RFC proposes env.reset() binds row 0, but the Gymnasium API signature is reset(seed?, episode_id?). The RFC must show how episode_id maps to dataset row selection — a caller passing episode_id=42 would expect row 42, not row 0.
Reward computation location unclear: The RFC introduces dataset rows with openenv_reset/openenv_step/task column conventions but doesn't specify whether reward computation (which must live inside the environment per RFC 002/004) reads from the dataset row. If the task column carries ground truth for reward logic, the RFC needs to state that the reward rubric lives server-side and the dataset row is input-only.
"Exactly one of space_id, image, or package" not validated: The RFC declares this constraint but provides no schema validation approach. It should reference the existing openenv.yaml schema validation pathway or specify how AutoEnv enforces mutual exclusivity at parse time.
URI scheme ambiguity: The hf://datasets/{repo_id}/{environment_id}@{revision} format doesn't clarify how {environment_id} resolves — is it a filename or a key within the YAML environments list? Needs to be unambiguous for multiple environments in one repo.
Phase 5 ("docs") has no testable deliverables: Phases 1-4 are concrete. Phase 5 should be fleshed out or merged into Phase 4.

Tier 2: Alignment Flags

ALIGNMENT FLAG: Dataset cursor iteration may expose reset control to agents

Invariant: "Agents cannot reset" (RFC 001)
Concern: If an agent can influence which row step() reads next, or if the cursor wraps/is queryable, an agent could learn the dataset has a "restart" boundary. The RFC doesn't specify whether cursor position is hidden from agents.

ALIGNMENT FLAG: Verifiers framework declaration enables external reward computation

Invariant: "Rewards inside environment" (RFC 002)
Concern: Verifiers computes reward/verification outside the environment boundary. If a dataset-bound environment declares a Verifiers runtime, reward computation would violate the invariant. The RFC needs to either exclude reward from the Verifiers path or define how external scores are re-ingested.

ALIGNMENT FLAG: AutoEnv resolution may break client-server separation

Invariant: Client-server separation
Concern: If AutoEnv downloads and instantiates the package entry on the client side before a container boundary exists, it may import server-side code from the client context.

ALIGNMENT FLAG: Cursor model and "one env = one trajectory"

Invariant: One env = one trajectory (RFC 004)
Concern: The RFC should explicitly state that each reset() starts a new trajectory bound to a new row, and the cursor doesn't create mid-episode task switching.

Verdict

5 spec issues + 4 alignment flags. The overall direction is sound but needs these clarifications before implementation begins.

Automated review by Claude Code | Learn more

docs: add hf rl environment datasets rfc

2ef60dd

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 21, 2026

burtenshaw changed the title ~~[codex] add env dataset RFC~~ [RFC] add Environment dataset (taskset) RFC May 29, 2026

burtenshaw marked this pull request as ready for review May 29, 2026 10:17

Darktex suggested changes May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] add Environment dataset (taskset) RFC#727

[RFC] add Environment dataset (taskset) RFC#727
burtenshaw wants to merge 1 commit into
huggingface:mainfrom
burtenshaw:codex/hf-rl-env-datasets

burtenshaw commented May 21, 2026

Uh oh!

Darktex left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

burtenshaw commented May 21, 2026

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Alignment Review — PR #727

Tier 1: Spec-Level Issues

Tier 2: Alignment Flags

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants