Implement multi-component reward schema and REPL rubric metadata#625
Implement multi-component reward schema and REPL rubric metadata#625adithya-s-k wants to merge 1 commit into
Conversation
Greptile SummaryThis PR adds a
Confidence Score: 3/5Not safe to merge without fixing the Python 3.9 incompatibility and the observation-mutation side-effect in REPLRubric.forward. Two P1 findings: a syntax incompatibility that breaks Python <3.10 at class-definition time, and an observation-mutation side-effect in the rubric that violates the documented responsibility boundary. Additionally, the backing RFC 006 is not yet committed to the repo. Multiple P1s pull the score below the 4/5 ceiling to 3/5. src/openenv/core/rubrics/components.py (Python version compat) and envs/repl_env/rubrics.py (_attach_reward_metadata side-effect). Important Files Changed
Sequence DiagramsequenceDiagram
participant Env as REPLEnvironment
participant Rubric as REPLRubric
participant EC as evaluate_components()
participant AWS as aggregate_weighted_sum()
participant ARM as _attach_reward_metadata()
participant Obs as observation.metadata
Env->>Rubric: forward(action, observation)
Rubric->>EC: evaluate_components(action, observation)
EC-->>Rubric: [RewardComponent(...)]
Rubric->>AWS: aggregate_weighted_sum(components)
AWS-->>Rubric: total: float
Rubric->>ARM: _attach_reward_metadata(observation, components, total)
ARM->>Obs: observation.metadata[reward_components] = ...
ARM->>Obs: observation.metadata[reward_total] = total
ARM->>Obs: observation.metadata[reward_aggregation] = weighted_sum_v1
Rubric-->>Env: return total (float)
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/openenv/core/rubrics/components.py
Line: 38-40
Comment:
**Python <3.10 runtime incompatibility**
`float | None` (PEP 604 union syntax) requires Python 3.10+ for *runtime* evaluation. Without `from __future__ import annotations`, this annotation is evaluated eagerly at class definition time, raising `TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'` on Python 3.8/3.9. The rest of the codebase (e.g. `models.py`) consistently uses `Optional[T]`, signaling Python < 3.10 is supported. Either add `from __future__ import annotations` at the top of this file, or replace the union with `Optional[float]`.
```suggestion
weighted_value: Optional[float] = Field(
default=None,
description="Optional explicit weighted value (defaults to value * weight)",
)
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/openenv/core/rubrics/components.py
Line: 13-14
Comment:
**Inconsistent `typing` imports alongside new-style union syntax**
The file imports `Dict` and `List` from `typing` (Python 3.8 compat style), but simultaneously uses `float | None` PEP-604 syntax (Python 3.10+). These two typing styles target incompatible Python version floors. Align to one style — either migrate all annotations to the modern built-in generics with `from __future__ import annotations`, or keep everything `typing`-module style (`Dict`, `List`, `Optional[float]`).
```suggestion
from typing import Any, Dict, List, Optional
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: envs/repl_env/rubrics.py
Line: 217-230
Comment:
**ALIGNMENT FLAG: Rubric directly mutates observation**
`_attach_reward_metadata` writes `reward_components`, `reward_total`, and `reward_aggregation` directly onto `observation.metadata` as a side-effect of reward computation. PATTERNS.md shows observations being constructed and returned by the environment (`step`), not mutated downstream. A rubric whose `forward` now carries an observation-mutation side-effect creates a responsibility leak: any caller of `REPLRubric.forward` that did not expect its observation to be modified (e.g., a Transform in the pipeline, a test double, or a future rubric composition) will silently receive a mutated object.
- **Principle at stake**: Separation of reward computation from observation construction (PATTERNS.md "Reward Computation" section; INVARIANTS.md "Rewards in environment")
- **The concern**: Rubric `forward` is documented to return a `float`; attaching metadata side-effects to the observation couples reward diagnostics to observation mutation in a non-obvious way and breaks composability
- **Suggested reviewer**: `@darktex`
**Context Used:** .claude/docs/INVARIANTS.md ([source](https://app.greptile.com/review/custom-context?memory=dbd1ab5e-bd4d-4701-9de0-9817404155a9))
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: envs/repl_env/rubrics.py
Line: 224-226
Comment:
**Silent metadata clobber on empty-dict metadata**
`getattr(observation, "metadata", None) or {}` evaluates to a *new* empty dict when `observation.metadata` is `{}` (empty dicts are falsy). Any keys already present would be discarded. Prefer an explicit `None` check to make the intent clear.
```suggestion
metadata = getattr(observation, "metadata", None)
if metadata is None:
metadata = {}
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Add multi-component reward schema and RE..." | Re-trigger Greptile |
| weighted_value: float | None = Field( | ||
| default=None, | ||
| description="Optional explicit weighted value (defaults to value * weight)", |
There was a problem hiding this comment.
Python <3.10 runtime incompatibility
float | None (PEP 604 union syntax) requires Python 3.10+ for runtime evaluation. Without from __future__ import annotations, this annotation is evaluated eagerly at class definition time, raising TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' on Python 3.8/3.9. The rest of the codebase (e.g. models.py) consistently uses Optional[T], signaling Python < 3.10 is supported. Either add from __future__ import annotations at the top of this file, or replace the union with Optional[float].
| weighted_value: float | None = Field( | |
| default=None, | |
| description="Optional explicit weighted value (defaults to value * weight)", | |
| weighted_value: Optional[float] = Field( | |
| default=None, | |
| description="Optional explicit weighted value (defaults to value * weight)", | |
| ) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/openenv/core/rubrics/components.py
Line: 38-40
Comment:
**Python <3.10 runtime incompatibility**
`float | None` (PEP 604 union syntax) requires Python 3.10+ for *runtime* evaluation. Without `from __future__ import annotations`, this annotation is evaluated eagerly at class definition time, raising `TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'` on Python 3.8/3.9. The rest of the codebase (e.g. `models.py`) consistently uses `Optional[T]`, signaling Python < 3.10 is supported. Either add `from __future__ import annotations` at the top of this file, or replace the union with `Optional[float]`.
```suggestion
weighted_value: Optional[float] = Field(
default=None,
description="Optional explicit weighted value (defaults to value * weight)",
)
```
How can I resolve this? If you propose a fix, please make it concise.| from enum import Enum | ||
| from typing import Any, Dict, List |
There was a problem hiding this comment.
Inconsistent
typing imports alongside new-style union syntax
The file imports Dict and List from typing (Python 3.8 compat style), but simultaneously uses float | None PEP-604 syntax (Python 3.10+). These two typing styles target incompatible Python version floors. Align to one style — either migrate all annotations to the modern built-in generics with from __future__ import annotations, or keep everything typing-module style (Dict, List, Optional[float]).
| from enum import Enum | |
| from typing import Any, Dict, List | |
| from typing import Any, Dict, List, Optional |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/openenv/core/rubrics/components.py
Line: 13-14
Comment:
**Inconsistent `typing` imports alongside new-style union syntax**
The file imports `Dict` and `List` from `typing` (Python 3.8 compat style), but simultaneously uses `float | None` PEP-604 syntax (Python 3.10+). These two typing styles target incompatible Python version floors. Align to one style — either migrate all annotations to the modern built-in generics with `from __future__ import annotations`, or keep everything `typing`-module style (`Dict`, `List`, `Optional[float]`).
```suggestion
from typing import Any, Dict, List, Optional
```
How can I resolve this? If you propose a fix, please make it concise.| def _attach_reward_metadata( | ||
| self, | ||
| observation: Any, | ||
| components: list[RewardComponent], | ||
| total: float, | ||
| ) -> None: | ||
| """Attach component diagnostics to observation metadata if available.""" | ||
| if not hasattr(observation, "metadata"): | ||
| return | ||
| metadata = getattr(observation, "metadata", None) or {} | ||
| metadata["reward_components"] = serialize_reward_components(components) | ||
| metadata["reward_total"] = total | ||
| metadata["reward_aggregation"] = "weighted_sum_v1" | ||
| observation.metadata = metadata |
There was a problem hiding this comment.
ALIGNMENT FLAG: Rubric directly mutates observation
_attach_reward_metadata writes reward_components, reward_total, and reward_aggregation directly onto observation.metadata as a side-effect of reward computation. PATTERNS.md shows observations being constructed and returned by the environment (step), not mutated downstream. A rubric whose forward now carries an observation-mutation side-effect creates a responsibility leak: any caller of REPLRubric.forward that did not expect its observation to be modified (e.g., a Transform in the pipeline, a test double, or a future rubric composition) will silently receive a mutated object.
- Principle at stake: Separation of reward computation from observation construction (PATTERNS.md "Reward Computation" section; INVARIANTS.md "Rewards in environment")
- The concern: Rubric
forwardis documented to return afloat; attaching metadata side-effects to the observation couples reward diagnostics to observation mutation in a non-obvious way and breaks composability - Suggested reviewer:
@darktex
Context Used: .claude/docs/INVARIANTS.md (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: envs/repl_env/rubrics.py
Line: 217-230
Comment:
**ALIGNMENT FLAG: Rubric directly mutates observation**
`_attach_reward_metadata` writes `reward_components`, `reward_total`, and `reward_aggregation` directly onto `observation.metadata` as a side-effect of reward computation. PATTERNS.md shows observations being constructed and returned by the environment (`step`), not mutated downstream. A rubric whose `forward` now carries an observation-mutation side-effect creates a responsibility leak: any caller of `REPLRubric.forward` that did not expect its observation to be modified (e.g., a Transform in the pipeline, a test double, or a future rubric composition) will silently receive a mutated object.
- **Principle at stake**: Separation of reward computation from observation construction (PATTERNS.md "Reward Computation" section; INVARIANTS.md "Rewards in environment")
- **The concern**: Rubric `forward` is documented to return a `float`; attaching metadata side-effects to the observation couples reward diagnostics to observation mutation in a non-obvious way and breaks composability
- **Suggested reviewer**: `@darktex`
**Context Used:** .claude/docs/INVARIANTS.md ([source](https://app.greptile.com/review/custom-context?memory=dbd1ab5e-bd4d-4701-9de0-9817404155a9))
How can I resolve this? If you propose a fix, please make it concise.| if not hasattr(observation, "metadata"): | ||
| return | ||
| metadata = getattr(observation, "metadata", None) or {} |
There was a problem hiding this comment.
Silent metadata clobber on empty-dict metadata
getattr(observation, "metadata", None) or {} evaluates to a new empty dict when observation.metadata is {} (empty dicts are falsy). Any keys already present would be discarded. Prefer an explicit None check to make the intent clear.
| if not hasattr(observation, "metadata"): | |
| return | |
| metadata = getattr(observation, "metadata", None) or {} | |
| metadata = getattr(observation, "metadata", None) | |
| if metadata is None: | |
| metadata = {} |
Prompt To Fix With AI
This is a comment left during a code review.
Path: envs/repl_env/rubrics.py
Line: 224-226
Comment:
**Silent metadata clobber on empty-dict metadata**
`getattr(observation, "metadata", None) or {}` evaluates to a *new* empty dict when `observation.metadata` is `{}` (empty dicts are falsy). Any keys already present would be discarded. Prefer an explicit `None` check to make the intent clear.
```suggestion
metadata = getattr(observation, "metadata", None)
if metadata is None:
metadata = {}
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
This PR implements the first end-to-end slice of RFC 006 by adding a standard reward-component schema in core rubrics and integrating it into
repl_envreward computation.What changed
RewardComponentTypeRewardComponentaggregate_weighted_sum(...)serialize_reward_components(...)openenv.core.rubricsREPLRubricto:reward_componentsreward_totalreward_aggregationCompatibility
observation.rewardremains the scalar optimization targetValidation
uv run ruff checkon touched files passedtests/core/test_rubrics/test_reward_components.pypassedtests/envs/test_repl_env.pywas skipped at collection in this environmentRelated