Skip to content

fix: preserve Observation.metadata in serialize_observation()#617

Open
goodmorningsaksham wants to merge 10 commits into
huggingface:mainfrom
goodmorningsaksham:fix/preserve-observation-metadata
Open

fix: preserve Observation.metadata in serialize_observation()#617
goodmorningsaksham wants to merge 10 commits into
huggingface:mainfrom
goodmorningsaksham:fix/preserve-observation-metadata

Conversation

@goodmorningsaksham
Copy link
Copy Markdown

Summary

serialize_observation() excluded metadata from model_dump() alongside reward and done, but unlike those two, metadata was never re-added to the response dict, it was silently dropped.

This promotes metadata to a top-level sibling key consistent with reward/done, adds it to StepResult, and updates GenericEnvClient to pass it through.

Fixes #616


How I found this

I built an OpenEnv environment (https://huggingface.co/spaces/saksham1771/infinite-dom) for the OpenEnv Hackathon India 2026.

Our environment returns task progress via Observation.metadata, things like how many graph nodes the agent has completed, which checkpoints are done. During live evaluation over WebSocket, this metadata was silently absent on the client side.

After debugging, I traced it to serialize_observation() excluding metadata without re-adding it.


Fix

The fix mirrors what the function already does for reward and done:

  • exclude metadata from the inner observation dict
  • then promote it to a top-level sibling key
  • skip including metadata if it's empty (avoid noise)

Changes

File Change
src/openenv/core/env_server/serialization.py Promote metadata to top-level key in response (skip when empty)
src/openenv/core/client_types.py Add metadata field to StepResult dataclass
src/openenv/core/generic_client.py Read metadata from payload in _parse_result()
tests/core/test_serialization.py 7 new tests: preservation, empty omission, round-trip, subclass fields

Test plan

  • pytest tests/core/test_serialization.py → 7/7 pass
  • pytest tests/core/test_types_and_enums.py → 36/36 pass (no regressions)
  • Empty metadata omitted from response (no noise)
  • Non-empty metadata survives serialize → client parse round-trip
  • StepResult.metadata defaults to None (backwards-compatible)

metadata was excluded from model_dump() alongside reward and done, but unlike those two it was never re-added to the response dict, silently dropped during serialization. This promotes metadata to a top-level sibling key consistent with reward/done, adds it to StepResult, and updates GenericEnvClient to pass it through.

Includes 7 new tests covering preservation, empty-metadata omission, round-trip through client, and subclass observation fields.
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 27, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

This PR correctly fixes a silent metadata loss bug in serialize_observation() for the WebSocket path and adds matching StepResult.metadata + GenericEnvClient support. However, the change introduces a P0 regression on the HTTP path: http_server.py lines 627 and 665 unpack serialize_observation() directly into ResetResponse(**...) and StepResponse(**...), both of which declare model_config = ConfigDict(extra=\"forbid\") with no metadata field — any step with non-empty metadata will raise a Pydantic ValidationError and crash the HTTP endpoint.

  • StepResponse and ResetResponse in src/openenv/core/env_server/types.py must each gain an Optional[Dict[str, Any]] metadata field before this merge.

Confidence Score: 2/5

Not safe to merge — introduces a P0 HTTP-server crash for all environments returning non-empty metadata.

A single P0 finding (StepResponse/ResetResponse rejecting the new metadata key with extra="forbid") caps confidence at 2/5. The WebSocket and client-side changes are correct, but the HTTP path regression is a definitive runtime crash, not a theoretical risk.

src/openenv/core/env_server/serialization.py (P0 regression) and src/openenv/core/env_server/types.py (StepResponse / ResetResponse must be updated — not in this PR's changeset).

Important Files Changed

Filename Overview
src/openenv/core/env_server/serialization.py Promotes metadata to top-level in serialize_observation(), but the returned dict now breaks StepResponse(...) and ResetResponse(...) in http_server.py because those models use extra="forbid" and have no metadata field — P0 crash on every HTTP step with non-empty metadata.
src/openenv/core/client_types.py Adds Optional[Dict[str, Any]] metadata field to StepResult with a None default; backwards-compatible and correct.
src/openenv/core/generic_client.py Reads metadata from the server payload via payload.get("metadata") in _parse_result; simple and correct.
tests/core/test_serialization.py Adds 7 well-scoped tests covering metadata preservation, empty omission, round-trip, and subclass fields; does not cover the HTTP path (StepResponse construction); missing trailing newline.

Sequence Diagram

sequenceDiagram
    participant Env as Environment
    participant Ser as serialize_observation()
    participant WS as WebSocket path
    participant HTTP as HTTP path
    participant Client as GenericEnvClient

    Env->>Ser: Observation(metadata={k:v})
    Ser-->>WS: {observation, reward, done, metadata} OK
    Ser-->>HTTP: {observation, reward, done, metadata} FAIL ValidationError extra=forbid
    WS-->>Client: StepResult(metadata={k:v}) OK
    HTTP-->>Client: 500 ValidationError crash
Loading

Comments Outside Diff (1)

  1. src/openenv/core/env_server/serialization.py, line 144-152 (link)

    P2 Docstring return-format block not updated

    The docstring still shows the pre-fix three-key format, omitting metadata. Consumers reading the docstring to understand the wire format won't know metadata can appear.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/openenv/core/env_server/serialization.py
    Line: 144-152
    
    Comment:
    **Docstring return-format block not updated**
    
    The docstring still shows the pre-fix three-key format, omitting `metadata`. Consumers reading the docstring to understand the wire format won't know `metadata` can appear.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/openenv/core/env_server/serialization.py
Line: 176-178

Comment:
**HTTP server crashes when metadata is non-empty**

`http_server.py` lines 627 and 665 do `StepResponse(**serialize_observation(observation))` and `ResetResponse(**serialize_observation(observation))`. Both `StepResponse` and `ResetResponse` are declared with `model_config = ConfigDict(extra="forbid")` and have no `metadata` field. When `serialize_observation()` now conditionally includes `metadata` in the returned dict, unpacking it into either Pydantic model raises a `ValidationError` — crashing the HTTP endpoint for every environment that returns non-empty metadata. The WebSocket path (`data=serialize_observation(observation)`) is unaffected because it uses `Dict[str, Any]`.

`StepResponse` and `ResetResponse` in `src/openenv/core/env_server/types.py` must each receive a `metadata: Optional[Dict[str, Any]] = None` field before this change is safe.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/openenv/core/env_server/serialization.py
Line: 144-152

Comment:
**Docstring return-format block not updated**

The docstring still shows the pre-fix three-key format, omitting `metadata`. Consumers reading the docstring to understand the wire format won't know `metadata` can appear.

```suggestion
    The format matches what EnvClient expects:
    {
        "observation": {...},  # Observation fields
        "reward": float | None,
        "done": bool,
        "metadata": {...},  # Optional; omitted when empty
    }
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/openenv/core/env_server/serialization.py
Line: 178

Comment:
**Missing trailing newline at end of file**

The file is missing a trailing newline (`\ No newline at end of file` in the diff). Most linters and POSIX tools expect files to end with a newline.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: tests/core/test_serialization.py
Line: 123

Comment:
**Missing trailing newline at end of file**

Same as `serialization.py` — the test file is also missing a trailing newline.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: preserve Observation.metadata in se..." | Re-trigger Greptile

Comment thread src/openenv/core/env_server/serialization.py Outdated
Comment thread src/openenv/core/env_server/serialization.py Outdated
Comment thread tests/core/test_serialization.py Outdated
Copy link
Copy Markdown
Contributor

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Alignment Review Report

Automated Checks

  • Lint: FAIL (one trailing whitespace issue in the diff; unrelated pre-existing format drift in envs/ not caused by this PR)
  • Debug code: CLEAN (no new debug statements introduced)

Tier 1: Fixes Required

  • src/openenv/core/env_server/types.py (line in diff — blank line before class BaseMessage) — trailing whitespace character on the blank separator line introduced between StepResponse and BaseMessage. ruff format will flag this. The raw diff shows + (whitespace-only added line). Please remove the trailing space.

Tier 2: Alignment Discussion

None identified. The change is a straightforward bug fix: metadata was excluded from the inner observation dict in serialize_observation() but, unlike reward and done, was never re-promoted to a top-level sibling key. The fix mirrors the existing reward/done pattern exactly.

A few minor observations worth noting (not blocking):

  1. if metadata: omits empty dicts — Using if metadata: instead of if metadata is not None: means an empty {} dict is silently omitted from the wire payload. The PR author states this is intentional (avoid noise) and a test explicitly asserts it (test_empty_metadata_omitted). This is consistent with Observation.metadata defaulting to {} when not set. Acceptable as stated, but reviewers should be aware the server never sends metadata: {} — clients can assume absence equals empty.

  2. ResetResponse gets metadata but reset path is not fully exercised — The PR correctly adds metadata: Optional[...] to both ResetResponse and StepResponse, and EnvClient._parse_result() is used for both reset and step responses. However, tests/core/test_serialization.py only tests step-path round-trips. A reset-path round-trip test (serialize_observation_parse_result for a reset observation) would complete coverage. Not a blocker.

  3. StepResult uses Optional[Dict[str, Any]] while Observation.metadata is never None (always a dict) — The asymmetry is intentional and correct: StepResult.metadata is None when the server omits the key entirely (e.g., old server, or empty-dict case above), whereas Observation.metadata on the server is always a dict. The _parse_result default of payload.get("metadata") returning None is the right sentinel for "not present".

The tests are high quality: they cover preservation, empty-dict omission, promotion to top-level sibling, subclass fields, and the full serialize→parse round-trip.


Summary

  • 1 mechanical issue to fix (trailing whitespace in types.py diff)
  • 0 alignment points for human review
  • Fix the whitespace nit and this is good to merge

Automated review by Claude Code | Learn more

@goodmorningsaksham
Copy link
Copy Markdown
Author

@Darktex I've addressed all the review comments and fixed the trailing
whitespace in types.py.

Ready for another look when you get a chance. Thank you for your time!

Copy link
Copy Markdown
Contributor

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Alignment Review Report

Automated Checks

  • Lint: PASS — ruff check passes on all four modified source files
  • Debug code: CLEAN — no debug artifacts in the diff

Tier 1: Fixes Required

  • src/openenv/core/env_server/serialization.py:136-151 — The docstring for serialize_observation() describes the return format as {"observation": {...}, "reward": float | None, "done": bool} but does not mention the now-possible "metadata" top-level key. Update the docstring to reflect the new schema (e.g. "metadata": dict when non-empty).

  • src/openenv/core/mcp_client.py:299 — Pre-existing but now visibly inconsistent: MchEnvClient._parse_result() constructs StepResult(observation=..., reward=..., done=...) without propagating metadata to StepResult, even though the StepResult dataclass now has a metadata field. The MCP client encodes env-level metadata inside the Observation object itself (lines 275, 289, 296), which is a different semantic layer, but callers who call .metadata on the StepResult returned by MchEnvClient will always get None. This is a pre-existing inconsistency that this PR surfaces — worth fixing here or in a follow-up, but should be called out.


Tier 2: Alignment Discussion

None identified. The fix is purely in the server serialization and client parsing layers. It does not:

  • Expose reset/simulation controls to agents
  • Import server code from client modules
  • Change the Gymnasium API signatures (reset, step, state)
  • Alter reward computation (metadata is informational, not reward logic)
  • Break client-server separation

The approach (promote metadata as a top-level sibling alongside reward and done) is consistent with how the existing reward/done promotion already works in serialize_observation().


Summary

The core bug fix is correct and well-tested (7 tests covering preservation, empty-omission, round-trip, and subclass fields). Two non-blocking issues to address: (1) the serialize_observation() docstring does not describe the new metadata key in its return schema, and (2) MchEnvClient._parse_result() in mcp_client.py does not populate StepResult.metadata from the top-level payload — leaving the new field dead for MCP callers. Neither is a blocker but both should be addressed before or shortly after merge.

  • 2 mechanical issues to fix (docstring gap, MCP client not populating StepResult.metadata)
  • 0 alignment points for human review

Automated review by Claude Code | Learn more

Copy link
Copy Markdown
Contributor

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Alignment Review Report

Automated Checks

  • Lint: PASS (lint.sh exited cleanly on PR files)
  • Debug code: CLEAN (no debug artifacts in changed files)

Tier 1: Fixes Required

Bug: mcp_client.py — observation.metadata stays empty after this fix

src/openenv/core/mcp_client.py:275,289,296 — All three Observation constructions in _parse_result still read metadata=obs_data.get("metadata", {}), where obs_data is payload.get("observation", {}). After the serializer change, metadata is no longer inside the nested observation dict; it's at the top-level payload. So any MCPToolClient user reading step_result.observation.metadata will always receive {}. The added metadata=payload.get("metadata") line (diff line +110) only propagates metadata to StepResult.metadata, not into the observation object itself. The fix requires changing those three lookups to payload.get("metadata", {}).

Inconsistency: empty-metadata round-trip is lossy

src/openenv/core/env_server/serialization.py:63 — The guard if metadata: omits metadata from the payload when it is {}. On the client, payload.get("metadata") then returns None, so StepResult.metadata is None even though Observation.metadata was {}. Callers that check step_result.metadata is not None will see different behaviour depending on whether metadata is empty or absent. Use if metadata is not None: to preserve the empty-dict case, or document the None/{} equivalence explicitly.

Missing coverage for reset path

ResetResponse in types.py gains a metadata field, but serialize_observation is only called on step responses. There are no tests or updated code paths showing that reset metadata is serialized or propagated to the client. If this field is intentional, the reset serialization path needs the same treatment.

Tier 2: Alignment Discussion

None identified.

Summary

  • 3 mechanical issues to fix (mcp_client observation.metadata bug is the blocker)
  • 0 alignment points for human review

Automated review by Claude Code | Learn more

- Use \if metadata is not None:\ in serialize_observation() to avoid
  lossy round-trip when metadata is an empty dict
- Fix mcp_client._parse_result() to read metadata from top-level
  payload instead of nested observation dict
- Add metadata field to StepResponse and ResetResponse Pydantic models
- Add test_reset_metadata_preserved to cover reset path round-trip
Copy link
Copy Markdown
Contributor

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Alignment Review Report

Automated Checks

  • Lint: Could not run (branch not checked out locally)
  • Debug code: None found in diff

Tier 1: Fixes Required

Critical — if metadata is not None guard is wrong

Observation.metadata is declared as Dict[str, Any] with default_factory=dict in src/openenv/core/env_server/types.py:89-91. It defaults to {}, never to None. The guard added in serialization.py:

if metadata is not None:
    result["metadata"] = metadata

will always be True for any Observation instance — an empty dict is not None. This means:

  • Empty metadata {} will always be included in the response, directly contradicting the PR description ("skip when empty") and breaking test_empty_metadata_omitted.
  • The correct guard is if metadata: (falsy check), which skips both None and {}.

Inconsistency: mcp_client.py metadata defaults diverge from generic_client.py

The PR sets metadata=payload.get("metadata", {}) in mcp_client.py when constructing Observation subclasses, but sets metadata=payload.get("metadata") (default None) for StepResult.metadata. This creates a mismatch: if the server omits metadata, StepResult.metadata is None but observation.metadata is {}. These should be consistent — use payload.get("metadata") throughout or normalize to None/{} explicitly.

Missing test: ResetResponse serialization path

The test test_reset_metadata_preserved constructs ResetResponse(**serialized) directly, but ResetResponse has extra="forbid". The PR's types.py diff adds metadata to ResetResponse and StepResponse — verify those additions are present, since without them http_server.py:601 (ResetResponse(**serialize_observation(observation))) will raise a ValidationError at runtime for any non-empty metadata.

No test for the reset path through http_server.py

serialize_observation is called at http_server.py:601 and 639. The test suite only directly tests _parse_result. A round-trip test through ResetResponse(**serialize_observation(...)) and StepResponse(**serialize_observation(...)) is needed to catch the extra="forbid" failure mode.

Tier 2: Alignment Discussion

ALIGNMENT FLAG: Metadata promotion may need an RFC

  • Principle at stake: PRINCIPLES.md — "Type safety with generics and Pydantic across the wire"; INVARIANTS.md Invariant 3 (Pydantic serialization)
  • The concern: Promoting metadata to a top-level sibling of observation/reward/done changes the wire contract. The existing pattern in PATTERNS.md shows _parse_result reading payload["observation"] for all env-specific data. Metadata is now split across the wire protocol. This is a backward-compatible add (new optional key), but it diverges from the documented pattern for custom clients that override _parse_result.
  • Suggested reviewer: @Darktex

Summary

  • 3 mechanical issues to fix (wrong None guard, mcp/generic inconsistency, missing integration test path)
  • 1 alignment point for human review (wire contract change)

Automated review by Claude Code | Learn more

Copy link
Copy Markdown
Contributor

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Alignment Review Report

Automated Checks

  • Lint (ruff check): PASS on all changed source files
  • Lint (ruff format): FAIL — tests/core/test_serialization.py is missing a trailing newline at EOF (line 286 of the diff ends without \n); ruff format --check would reformat this file
  • Debug code: CLEAN — no new debug code introduced

Tier 1: Fixes Required

  • src/openenv/core/mcp_client.py:110,119,128 (diff lines) — Runtime ValidationError bug: The three lines that construct ListToolsObservation, CallToolObservation, and Observation now pass metadata=payload.get("metadata"), which returns None when the key is absent. All three Pydantic models declare metadata: Dict[str, Any] (not Optional) with a default_factory=dict. Passing None raises pydantic.ValidationError: Input should be a valid dictionary at runtime — this will break every _parse_result call for environments that don't populate metadata (the common case after this fix). The correct fix is metadata=payload.get("metadata") or {}.

  • tests/core/test_serialization.py:286 — Missing trailing newline at end of file. ruff format --check fails on files without a trailing newline, which will break the pre-commit and CI lint hooks.

Tier 2: Alignment Discussion

None identified. The fix correctly mirrors the existing reward/done promotion pattern in serialize_observation(), is additive and backwards-compatible (StepResult.metadata defaults to None), and the change appropriately touches only the serialization/client layer without exposing any simulation controls to agents or violating client-server separation.

Summary

  • 2 mechanical issues to fix (1 runtime bug, 1 formatting)
  • 0 alignment points for human review

The core idea is correct — the bug is real and the approach is right. The mcp_client.py change just needs or {} as a fallback to keep the Pydantic models valid when metadata is absent from the payload.


Automated review by Claude Code | Learn more

Copy link
Copy Markdown
Contributor

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Summary

serialize_observation() was silently dropping Observation.metadata — a real bug confirmed by the PR author's production debugging. The fix is structurally correct: promote metadata to a top-level sibling key, mirror the pattern used for reward and done, add the field to StepResult, ResetResponse, StepResponse, and both clients. The approach is right; two mechanical issues need fixing before merge.

Tier 1 — Bugs / Quality

  • src/openenv/core/env_server/serialization.py (post-patch, around line 178) — if metadata: is a truthiness check, not a None check. Any metadata value that is falsy — e.g. {"active": False}, {"count": 0}, {"flag": ""} — passes the is not None guard but is silently dropped. Change to if metadata is not None: to match the intent.

  • src/openenv/core/mcp_client.py — Three call sites now use payload.get("metadata") or {} to populate observation.metadata. The or {} coerces None (absent key) to {}, which means observation.metadata is always a dict (never None), whereas StepResult.metadata in the same function uses payload.get("metadata") which returns None when absent. This is an internal inconsistency and also diverges from GenericEnvClient, which correctly returns None. Decide on a canonical "no metadata" sentinel (recommend None everywhere for consistency with Optional[Dict]) and apply it uniformly.

  • tests/core/test_serialization.py — No test covers the falsy-metadata edge cases (e.g. metadata={"done": False}, metadata={"score": 0}). A test asserting these are preserved (not omitted) must be added alongside the if metadata is not None: fix.

Tier 2 — Alignment

Aligned. The change propagates environment-produced metadata through the serialization layer to the client — this is firmly within the "rewards inside environment" principle (metadata is domain knowledge originating in the environment, not injected externally). No simulation controls are exposed, client-server separation is respected, no new API surface deviates from the Gymnasium pattern, and no RFC-level trade-off is touched.

Verdict

Request changes: fix the if metadata: truthiness bug and the or {} inconsistency in mcp_client.py, and add a test for falsy-valued metadata before merging.


Automated review by Claude Code | Learn more

@goodmorningsaksham
Copy link
Copy Markdown
Author

Hi @Darktex , I applied your suggested changes, can you please review them?

Copy link
Copy Markdown
Contributor

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Alignment Review — PR #617

Automated Checks

  • Lint (ruff check): PASS — all 5 changed source files pass clean
  • Format (ruff format): PASS
  • Import sort (usort): PASS
  • Debug code: CLEAN

Tier 1: Bugs / Issues

T1-1. mcp_client.py — passing None to non-optional Dict[str, Any] field will crash
src/openenv/core/mcp_client.py — the PR changes obs_data.get("metadata", {})payload.get("metadata") for all three observation constructors (ListToolsObservation, CallToolObservation, generic Observation). But the metadata field on these Pydantic models is typed Dict[str, Any] (non-optional, default {}). When payload has no top-level `"metadata"` key (i.e., every existing environment server), payload.get("metadata") returns None, which will raise a Pydantic ValidationError. This is a runtime regression. Fix: use payload.get("metadata") or {} or payload.get("metadata", {}).

T1-2. {} (empty dict) vs None inconsistency
src/openenv/core/env_server/serialization.py — the guard if metadata is not None: will include {} in the output because Observation.metadata defaults to dict (never None). The PR description says "skip when empty" but the code doesn't skip empty dicts. Meanwhile, StepResult.metadata defaults to None. This creates a mismatch: server always sends "metadata": {}, client defaults to None. Fix: either change guard to if metadata: (truthy, skips {}), or change StepResult.metadata default to {}.

T1-3. Stale docstring in serialize_observation()
src/openenv/core/env_server/serialization.py — the docstring still shows the old return format without metadata. Update to reflect the new top-level metadata key.

T1-4. Verify ResetResponse/StepResponse updates landed
src/openenv/core/env_server/types.py — the diff adds metadata: Optional[Dict[str, Any]] to both models. Both have model_config = ConfigDict(extra="forbid"). If these updates are missing from the actual file, any response containing "metadata" will raise a Pydantic ValidationError on the HTTP path. The serializer and type updates must land atomically.

T1-5. Verify generic_client.py update landed
src/openenv/core/generic_client.py — the diff adds metadata=payload.get("metadata") to StepResult(...). The StepResult dataclass gains a metadata field — but GenericEnvClient._parse_result() must be updated atomically or the field will always be None.


Tier 2: Alignment Flags

🚩 ALIGNMENT FLAG: Wire contract change without version documentation

  • Invariant: "Pydantic serialization: All wire types must be Pydantic models. Serialization must be JSON-compatible."
  • Concern: Adding "metadata" as a new top-level key in every step/reset response changes the wire format for all environments. Any client parsing with strict schema validation (extra="forbid") will break if the type updates aren't atomic. Pre-1.0 this is acceptable per breaking-change policy, but it needs documentation in a changelog entry.
  • Reviewer: @Darktex

🚩 ALIGNMENT FLAG: mcp_client.py breaks backward compatibility for existing servers

  • Invariant: "Minimize lifecycle deltas: Training → Evals → Production should use identical interfaces"
  • Concern: Before this PR, environments embedded metadata inside obs_data (the observation dict). The PR moves reading to payload (top-level). Un-upgraded server Docker images will continue embedding metadata inside obs_data. The new client won't read it from there — metadata silently becomes None/{} with no error. This is silent data loss in distributed deployments. Consider a backward-compatibility shim: try top-level first, fall back to obs_data.
  • Reviewer: @Darktex

Verdict: Request Changes

T1-1 is critical: passing None to a non-optional Dict[str, Any] Pydantic field in mcp_client.py will raise ValidationError at runtime on every response from an un-upgraded server — i.e., every existing deployment. This must be fixed before merge. T1-2 (empty dict inconsistency) should also be resolved for a clean contract.


Automated review by Claude Code | Learn more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

serialize_observation() silently drops Observation.metadata

2 participants