Skip to content

fix: handle null base64_image in MCP screenshot responses#476

Open
dzorlu wants to merge 85 commits into
huggingface:mainfrom
fleet-ai:fix/null-screenshot-url
Open

fix: handle null base64_image in MCP screenshot responses#476
dzorlu wants to merge 85 commits into
huggingface:mainfrom
fleet-ai:fix/null-screenshot-url

Conversation

@dzorlu
Copy link
Copy Markdown

@dzorlu dzorlu commented Mar 29, 2026

Summary

  • Add null check for parsed["base64_image"] in _extract_tool_result
  • When Fleet MCP returns {"base64_image": null} (failed screenshot capture), return a text error message instead of creating an image_url block with url: None
  • Prevents downstream AttributeError in SkyRL's extract_images_from_conversation

Root cause

_extract_tool_result unconditionally wraps parsed["base64_image"] in an OpenAI image_url block. When the value is None, this creates {"url": None} which crashes .startswith() in the consumer.

Test plan

  • Verify VL training handles failed screenshots gracefully (text error instead of crash)
  • Confirm working screenshots still produce valid image_url blocks

🤖 Generated with Claude Code

Deniz and others added 30 commits December 12, 2025 12:14
- FleetTaskEnv wraps FleetEnvClient with task-oriented interface
- Accepts task configs from export_training_tasks.py
- Creates versioned environments on reset
- Injects task prompt into observations
- Executes verifier for reward computation on episode completion
- Supports both sync and async step methods
- Factory functions: make_fleet_task_env, from_json_file
- Tests: 20 unit tests for init, specs, verifiers, factories

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The MCP images don't exist for all environment versions, causing
FleetVersionNotFoundError when trying to create environments.
Changing the default to None allows the Fleet SDK to use standard
images which are available for all versions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
FleetEnvClient.from_fleet() was not accepting data_key/data_version
parameters, causing them to be passed through **kwargs to HTTPEnvClient
which doesn't accept them.

- Add data_key and data_version as explicit parameters
- Pass them to fleet.make()
- Update task_env.py to pass them separately

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fleet SDK expects data_key in "key:version" format, not as separate
parameters. Updated from_fleet() to combine them before calling
fleet.make().

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
HTTPEnvClient.reset() doesn't support seed parameter yet.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Increases default timeout from 15s to 60s for Fleet API calls.
This prevents timeouts during environment initialization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously reset() did partial work and reset_async() added tool fetching.
Now reset_async() does all the work (including fetching tools) and reset()
is just a sync wrapper that calls it via run_until_complete().

This ensures both methods return identical results including tools.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
MCP's call_tool() returns a CallToolResult Pydantic object, not plain text.
This was causing ugly repr strings to be passed to agents like:
  "meta=None content=[TextContent(type='text', text='...')] ..."

Now properly extracts:
- Text content from result.content[].text
- Tries JSON parsing for structured results
- Falls back to structuredContent if available
- Handles isError cases

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests for:
- FleetMCPClient._extract_tool_result():
  - Single text content extraction
  - JSON parsing from text
  - Multiple text contents
  - Error result handling
  - Structured content fallback
  - Empty result handling

- FleetTaskEnv reset:
  - reset_async() returns tools
  - reset() calls reset_async() (sync wrapper)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move fleet.make() and list_tools() into FleetTaskEnv.__init__()
- Tools are now fetched at env creation, not during reset
- reset_async() calls _orch.reset() with error handling, returns cached tools
- Use asyncio.run() for Python 3.13 compatibility
- Update tests for new initialization pattern

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Log task_key and verifier code preview when verifier fails
- Catch syntax errors separately with clear message
- Show which functions were found if 'verify' is missing

Helps debug issues like "Verifier code must define a 'verify' function"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace custom _execute_verifier_local() with Fleet SDK's Task.verify_detailed()
which properly sets up the verifier namespace with:
- Environment type annotation
- Helper functions (normalized_contains, etc.)
- Proper function discovery (not just "verify" function)

This fixes "name 'Environment' is not defined" errors during verifier execution.

Changes:
- _compute_reward: Create Fleet SDK Task and call verify_detailed()
- Support both 'verifier_code' and 'verifier_func' field names
- Add comprehensive logging for debugging
- Remove broken _execute_verifier_local method

Tests:
- Update all verifier tests to mock Fleet SDK Task.verify_detailed()
- Add tests for various edge cases (no verifier, no orch, exceptions)
- Fix fixture to avoid asyncio.run() conflicts with pytest-asyncio

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add retry with exponential backoff (3 attempts, 1s/2s/4s delays)
- Log errors instead of silently swallowing exceptions
- Log warning when some clients fail but others succeed
- Log error after all retries exhausted

This fixes silent failures when MCP connections are flaky, which caused
'no tools found' errors in SkyRL training.
call_tool now retries with exponential backoff (3 attempts, 1s/2s/4s)
on connection errors, similar to list_tools.

ValueError (tool not found) is not retried.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds exponential backoff retry (3 attempts, 2s base delay) around
fleet.make() to handle transient Fleet API errors like health check
failures that can occur during instance provisioning.

Only retries on transient errors (health check, timeout, connection).
Permanent errors are raised immediately.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add Toolathlon-style context management tools for long trajectories:
- check_context: Check visible/total turn counts
- manage_context: Drop old turns to free up context space
- search_history: Search all history (including dropped)
- search_tool_output: Search truncated tool output
- view_tool_output: Paginate through truncated output

The ContextManager class can be used by any training framework that
maintains chat_history. It tracks full history and handles truncated
tool outputs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Computer-use tasks require MCP-enabled container images (e.g., famazon:mcp0.0.7)
which have scrot installed for screenshots and the MCP server with 'computer' tool
for mouse/keyboard control.
Previously, tools were only fetched for tool_use modality due to a
restrictive condition. This caused computer_use tasks to fail with
"no tools found in observation" because the computer tool (mouse,
keyboard, screenshot) was never fetched.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When task_modality is computer_use, filter tools to only include
the 'computer' tool. This prevents the model from using API tools
when it should be using mouse/keyboard control.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Deniz and others added 27 commits March 3, 2026 19:49
AsyncFleet.make() is not truly non-blocking — diagnostics confirmed the
event loop stays healthy while make() blocks internally. Switch to running
sync Fleet.make() in a thread pool via asyncio.to_thread() to guarantee
non-blocking behavior. Also removes the separate sync Fleet.instance()
call since we already have a sync env handle from make().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… loop

Three sync calls were blocking the event loop during concurrent trajectories:

1. self._orch.reset() — sync HTTP POST to manager API, now uses reset_async()
   which runs in a thread pool
2. fleet_task.verify_detailed() — sync verifier execution, now wrapped in
   asyncio.to_thread()
3. self._fleet_env.close() — sync instance termination, now has close_async()

Also reverts AsyncFleet.make() (was correctly async all along — diagnostics
confirmed the event loop stays healthy during make() calls).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds create_trace_job() and upload_trace() to fleet_env module.
These allow SkyRL eval to send conversation traces (with screenshots)
to the Fleet API for viewing in the Fleet UI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r UI rendering

Images were being passed as OpenAI format (type: "image_url") but Fleet's
ingest API expects (type: "image", mime_type, data). The API then uploads
base64 to S3 and the UI renders them full-size via OpinionatedImage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ingest API determines session status from score presence, not the
status field. Without score, all sessions stay as in_progress.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New HintGenerator module that produces concise hints from failed rollout
data to rescue GRPO signal on hard tasks. Three modes:
- Option B: LLM call with verifier code + tool errors + chat_history
- Option C: LLM call with verifier code only (cacheable per-task)
- Option D: LLM call synthesizing tool errors + verifier failure

FleetTaskEnv changes:
- Accumulate tool errors during step_async()
- Capture verifier error details in _compute_reward()
- New properties: verifier_code, tool_errors, verifier_error
- New reset_for_hint_async(hint) for hinted rollouts (reuses
  provisioned instance, resets DB to seed, appends hint to prompt)
- compute_hint_reward() utility: R = (1 - raw_score) * hint_score

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: Add Logfire error tracking to fleet env
When `partial_reward=True`, failed verifier runs compute a fractional
score from the error/success accumulators instead of binary 0/1.
Passing tasks are unaffected. Off by default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SkyRL can end trajectories early (context overflow, its own max_turns)
without OpenEnv knowing. Previously the model got 0 reward even if the
task was completed. Now close_async()/close() run the verifier when
step_async() never computed a reward, storing the result in
self.final_reward for SkyRL to read.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: Run verifier at close() for orphaned rollouts
Carlisle tasks (354 total, 8 in eval) require models to call
submit_final_answer to submit results, but this tool is a
harness-level synthetic injected by the orchestrator's SessionWorkflow,
not an MCP tool. OpenEnv connects directly to MCP servers, so the tool
was missing — causing 0% scores across all carlisle tasks in training.

Changes:
- Inject submit_final_answer into tool list when prompt references it
- Intercept calls locally (not routed to MCP), store the answer
- Pass final_answer to verifier via Fleet SDK's verify_detailed()
- Run verifier in close()/close_async() for orphaned rollouts
- Add unit tests for the synthetic tool

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: Add submit_final_answer synthetic tool for carlisle tasks
Runs k × m rollouts of generated tasks on Fleet environments.
Given (prompt, verifier_code, env_key), creates FleetTaskEnv instances,
runs agent loops with model inference, and returns structured results
for reward computation (learnability variance + model separation).

Used as the inner loop of the task-scaling RL pipeline in SkyRL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of calling Anthropic directly and running a local agent loop,
the evaluator now:
1. Imports the generated task via fleet.import_task()
2. Creates a harness job via fleet.create_job()
3. Polls for completion
4. Extracts per-session verifier scores from job sessions

Uses real Fleet model IDs (claude-sonnet-4.5, claude-opus-4.5) instead
of the broken weak/strong mapping that required ANTHROPIC_API_KEY.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fleet harness POST /v1/jobs requires model IDs in 'provider/model'
format (e.g., 'anthropic/claude-sonnet-4.5'), not just the model name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The sync time.sleep() in _poll_job blocked the asyncio event loop,
preventing trajectory timeouts from cancelling evaluations. Using
asyncio.sleep() allows the event loop to properly handle cancellations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fleet returns session model IDs without provider prefix (e.g., "claude-sonnet-4.5")
while we configure them with prefix ("anthropic/claude-sonnet-4.5"). Added
_match_model_id() to normalize and match by bare model name, so scores land
in the correct results_per_model bucket.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: Add TaskEvaluator for task generation inner loop
Add _tool_errors, _verifier_stdout, _verifier_error to FleetTaskEnv so
SkyRL can build hints from failed rollout feedback without an LLM call.

- Tool errors accumulated in step_async() on MCP errors and exceptions
- Verifier stdout/error captured in _compute_reward() after verifier runs
- Verifier exceptions also captured in _verifier_error (not just failures)
- All feedback properties reset in reset_async() to prevent cross-episode leakage
- Properties: verifier_stdout, verifier_error, tool_errors_list

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Expose verifier feedback properties for hint generation
Add describe_db() and query_db() methods (sync + async) to
FleetEnvClient, delegating to the Fleet SDK's SQLiteResource.
This enables querying provisioned environment databases (seed/current)
from outside the container via HTTP, which is needed for task
generation workflows where the model explores DB data before
designing tasks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
describe_db_async/query_db_async were wrapping sync describe_db/query_db
in asyncio.to_thread, but when _fleet_env is an AsyncEnv (from
AsyncFleet.make), .db().describe() and .query() are async coroutines.
Now detects whether the resource method is a coroutine and awaits
directly instead of using to_thread.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the Fleet MCP server returns {"base64_image": null} (failed screenshot),
_extract_tool_result would create an image_url block with url=None, causing
downstream AttributeError in SkyRL's extract_images_from_conversation.

Now returns a text error message instead of propagating the null.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 29, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 29, 2026

Greptile Summary

This PR introduces the fleet_env integration module for OpenEnv — FleetEnvClient, FleetMCPTools, FleetTaskEnv, and supporting utilities — and applies a targeted bug fix: _extract_tool_result now null-checks parsed["base64_image"] before constructing an OpenAI image_url block, preventing a downstream AttributeError in SkyRL's extract_images_from_conversation when Fleet MCP returns {"base64_image": null} for a failed screenshot.

Key findings:

  • The null-guard fix itself is correct (lines 149–154 of fleet_mcp_client.py). The new if data_url is not None branch returns a descriptive string instead of {"url": None}, cleanly breaking the crash path.
  • Three tests will fail at runtime (test_fleet_env_from_fleet_returns_orchestrator_and_tools, test_fleet_env_reset_uses_http_manager_base_url, test_fleet_env_step_rejects_tool_actions in test_fleet_env.py). All three call FleetEnvClient.from_fleet() without the three required positional arguments data_key, data_version, and image_type. Python raises TypeError before the test body executes. The TestFleetEnvClientDbQuery block in the same file shows the correct call pattern.
  • No regression test for the fixed code path. The TestFleetMCPClientExtractToolResult class covers eight scenarios but not the null base64_image case that motivated this PR.

Alignment notes:

  • Client/server separation is maintained: FleetEnvClient.step() actively rejects ListToolsAction/CallToolAction, enforcing the MCP-only channel for agent actions.
  • Rewards are computed entirely inside the environment (_compute_reward via the Fleet SDK verifier), consistent with the "rewards inside environment" principle.
  • No RFC-breaking changes detected; this is additive integration code.

Confidence Score: 4/5

Safe to merge after fixing the three broken test call-sites; the core logic is sound.

One P1 issue remains: three tests in test_fleet_env.py call from_fleet() without required arguments and will raise TypeError, meaning those tests never validate their intended behavior. The core bug fix in fleet_mcp_client.py is correct and all other code is well-structured. Score is 4 rather than 5 because the failing tests should be fixed before merge.

tests/envs/test_fleet_env.py lines 79–109 — three tests use an incomplete from_fleet() signature.

Important Files Changed

Filename Overview
src/envs/fleet_env/fleet_mcp_client.py Core bug fix: adds null guard for base64_image in _extract_tool_result, correctly returning a text error string instead of {"url": None} that would crash downstream .startswith() calls.
tests/envs/test_fleet_env.py Three tests call FleetEnvClient.from_fleet() without required positional args data_key, data_version, image_type — these will fail with TypeError. Also missing a regression test for the null base64_image fix.
src/envs/fleet_env/mcp_tools.py New file: FleetMCPTools agent-facing wrapper with retry logic for list_tools and call_tool over multiple MCP endpoints with deduplication.
src/envs/fleet_env/client.py New file: FleetEnvClient HTTP orchestrator with sync/async provisioning, DB query helpers, and clean separation that rejects MCP tool actions via step().
src/envs/fleet_env/task_env.py New file: FleetTaskEnv Gymnasium-compatible task wrapper with async provisioning, modality-based tool filtering, verifier execution, and rollout telemetry.
tests/envs/test_fleet_task_env.py New test file for FleetTaskEnv; tests use proper mocking and cover init, specs, and rollout lifecycle.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant FleetMCPTools
    participant FleetMCPClient
    participant FleetMCP as Fleet MCP Server

    Agent->>FleetMCPTools: call_tool("computer", {action: "screenshot"})
    FleetMCPTools->>FleetMCPClient: call_tool("computer", args)
    FleetMCPClient->>FleetMCP: session.call_tool(name, arguments)
    FleetMCP-->>FleetMCPClient: CallToolResult(content=[TextContent(text='{"base64_image": null}')])
    FleetMCPClient->>FleetMCPClient: _extract_tool_result(result)
    Note over FleetMCPClient: json.loads → parsed["base64_image"] is None<br/>NEW: returns "Screenshot capture failed (null image)"<br/>OLD: returned [{"type":"image_url","image_url":{"url":None}}]
    FleetMCPClient-->>FleetMCPTools: "Screenshot capture failed (null image)"
    FleetMCPTools-->>Agent: str (safe — no crash downstream)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: tests/envs/test_fleet_env.py
Line: 79-85

Comment:
**`from_fleet` called without required positional arguments**

`FleetEnvClient.from_fleet()` requires three additional positional parameters — `data_key: str`, `data_version: str`, and `image_type: str` — beyond `api_key` and `env_key`. None of these three carry default values. Calling the method while omitting them will raise `TypeError` before any assertion runs, failing the test.

The same problem exists on lines 88–98 and 101–109.

The fix is to pass placeholder values for all three required parameters. The `TestFleetEnvClientDbQuery` block (lines ~488–494) already shows the correct calling convention with `data_key`, `data_version`, and `image_type` filled in.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: tests/envs/test_fleet_env.py
Line: 343-391

Comment:
**No regression test for the null `base64_image` fix**

The PR's stated purpose is to prevent the crash when `{"base64_image": null}` is returned by Fleet MCP, but the test class only exercises the happy path (a valid data URL in `test_extract_base64_image_json_format`). There is no test asserting that `_extract_tool_result` returns a plain string error when `base64_image` is `null`.

A companion test alongside the existing JSON-format tests would guard against this exact regression. The test should construct a `TextContent` whose `.text` is `'{"base64_image": null}'`, call `_extract_tool_result`, and assert the result is a `str` (not a `list` or `dict` with `url: None`).

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: handle null base64_image from Fleet..." | Re-trigger Greptile

Comment on lines +79 to +85
@pytest.mark.usefixtures("fake_requests_session", "fake_fleet_module")
def test_fleet_env_from_fleet_returns_orchestrator_and_tools():
from envs.fleet_env import FleetEnvClient, FleetMCPTools

orch, tools = FleetEnvClient.from_fleet(api_key="k", env_key="e")
assert isinstance(orch, FleetEnvClient)
assert isinstance(tools, FleetMCPTools)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 from_fleet called without required positional arguments

FleetEnvClient.from_fleet() requires three additional positional parameters — data_key: str, data_version: str, and image_type: str — beyond api_key and env_key. None of these three carry default values. Calling the method while omitting them will raise TypeError before any assertion runs, failing the test.

The same problem exists on lines 88–98 and 101–109.

The fix is to pass placeholder values for all three required parameters. The TestFleetEnvClientDbQuery block (lines ~488–494) already shows the correct calling convention with data_key, data_version, and image_type filled in.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/envs/test_fleet_env.py
Line: 79-85

Comment:
**`from_fleet` called without required positional arguments**

`FleetEnvClient.from_fleet()` requires three additional positional parameters — `data_key: str`, `data_version: str`, and `image_type: str` — beyond `api_key` and `env_key`. None of these three carry default values. Calling the method while omitting them will raise `TypeError` before any assertion runs, failing the test.

The same problem exists on lines 88–98 and 101–109.

The fix is to pass placeholder values for all three required parameters. The `TestFleetEnvClientDbQuery` block (lines ~488–494) already shows the correct calling convention with `data_key`, `data_version`, and `image_type` filled in.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +343 to +391
def test_extract_base64_image_json_format(self):
"""Should convert Fleet MCP's base64_image JSON format to OpenAI format."""
from envs.fleet_env.fleet_mcp_client import FleetMCPClient

client = FleetMCPClient(url="http://test", api_key="test")

# Fleet MCP returns screenshot as JSON text with base64_image key
class _TextContent:
type = "text"
text = '{"base64_image": "data:image/jpeg;base64,/9j/4AAQSkZJRg..."}'

class _Result:
content = [_TextContent()]
isError = False
structuredContent = None

result = client._extract_tool_result(_Result())

# Should be converted to OpenAI-compatible format
assert isinstance(result, list)
assert len(result) == 1
assert result[0]["type"] == "image_url"
assert (
result[0]["image_url"]["url"] == "data:image/jpeg;base64,/9j/4AAQSkZJRg..."
)

def test_extract_base64_image_preserves_other_json(self):
"""Should preserve normal JSON responses that don't have base64_image."""
from envs.fleet_env.fleet_mcp_client import FleetMCPClient

client = FleetMCPClient(url="http://test", api_key="test")

# Normal JSON response without base64_image
class _TextContent:
type = "text"
text = '{"status": "success", "data": [1, 2, 3]}'

class _Result:
content = [_TextContent()]
isError = False
structuredContent = None

result = client._extract_tool_result(_Result())

# Should return parsed dict as-is
assert isinstance(result, dict)
assert result["status"] == "success"
assert result["data"] == [1, 2, 3]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No regression test for the null base64_image fix

The PR's stated purpose is to prevent the crash when {"base64_image": null} is returned by Fleet MCP, but the test class only exercises the happy path (a valid data URL in test_extract_base64_image_json_format). There is no test asserting that _extract_tool_result returns a plain string error when base64_image is null.

A companion test alongside the existing JSON-format tests would guard against this exact regression. The test should construct a TextContent whose .text is '{"base64_image": null}', call _extract_tool_result, and assert the result is a str (not a list or dict with url: None).

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/envs/test_fleet_env.py
Line: 343-391

Comment:
**No regression test for the null `base64_image` fix**

The PR's stated purpose is to prevent the crash when `{"base64_image": null}` is returned by Fleet MCP, but the test class only exercises the happy path (a valid data URL in `test_extract_base64_image_json_format`). There is no test asserting that `_extract_tool_result` returns a plain string error when `base64_image` is `null`.

A companion test alongside the existing JSON-format tests would guard against this exact regression. The test should construct a `TextContent` whose `.text` is `'{"base64_image": null}'`, call `_extract_tool_result`, and assert the result is a `str` (not a `list` or `dict` with `url: None`).

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant