Skip to content

feat(agents): agent-encode PoseStamped + position nav skills#2000

Open
douglasswng wants to merge 1 commit intodimensionalOS:devfrom
douglasswng:feat/agent-encode-posestamped-1693
Open

feat(agents): agent-encode PoseStamped + position nav skills#2000
douglasswng wants to merge 1 commit intodimensionalOS:devfrom
douglasswng:feat/agent-encode-posestamped-1693

Conversation

@douglasswng
Copy link
Copy Markdown

Problem

LLM agents can't act on raw PoseStamped data. When a skill or memory query returns a pose (e.g. memory2 returning the location of an interesting place), the agent has no way to render it for reasoning or hand it back to a navigation skill. This is the missing layer between perception/memory output and agent reasoning/action.

Closes #1693

Solution

Two pieces, designed so the encode output and skill inputs share the same keys (agent can pipe the dict straight through):

  1. PoseStamped.agent_encode() — returns a flat dict (frame_id, x, y, z, yaw_deg). Position rounded to 3 dp, yaw to 1 dp degrees, so the agent sees human-scale numbers instead of float noise.

  2. Three new skills on NavigationSkillContainer:

    • navigate_to_position(x, y, yaw_deg, frame_id) — go to an absolute pose.
    • rotate_toward_position(x, y, frame_id) — face a target without translating; yaw computed from current odom.
    • current_pose() — return the robot's pose so the agent can reason about relative geometry (e.g. distance, bearing) without us baking that into the encode.

Also extracted a _make_position_goal helper now used by the three new skills and the existing _get_goal_pose_from_result.

Why absolute, not body-relative. MCP auto-renders skill returns via result.agent_encode() with no args (mcp_server.py:130-132). Making the output body-relative requires an observer somewhere — and since the call site is MCP, that means coupling a transport layer to live odometry (per-tool subscriptions, hidden state between skill return and agent input). Absolute coords + a current_pose skill avoids that: the agent composes the relative view itself, the rendering layer stays context-free.

Breaking Changes

None.

How to Test

uv run pytest dimos/msgs/geometry_msgs/test_PoseStamped.py \
              dimos/agents/skills/test_navigation.py -v

Covers agent_encode (fields + rounding), each new skill (happy path), and the no-odom branches of rotate_toward_position / current_pose.

Not tested end-to-end: I don't have an OpenAI API key or local GPU, so I couldn't run an agentic blueprint and watch the LLM actually invoke these skills. Unit tests cover the skill internals; the agent-facing contract (docstring → schema, return-string interpretability) is untested. Happy to coordinate on a smoke test — e.g., spin up unitree-go2-agentic (or any blueprint with McpClient) and prompt the agent to fetch its current pose and navigate to (1, 0).

Contributor License Agreement

  • I have read and approved the CLA.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 7, 2026

Greptile Summary

This PR introduces PoseStamped.agent_encode() — a flat dict representation designed for LLM consumption — and three new NavigationSkillContainer skills (navigate_to_position, rotate_toward_position, current_pose) that close the loop between memory/perception output and agent-driven navigation. A module-level _make_position_goal helper is extracted and reused across the new and existing goal-construction sites.

  • agent_encode() returns frame_id, x, y, z, and yaw_deg (position rounded to 3 dp, yaw to 1 dp), intentionally matching the keyword arguments of navigate_to_position so the agent can pipe the dict straight through.
  • rotate_toward_position computes heading from the robot's odometry to a target point, but does not validate that self._latest_odom.frame_id matches the frame_id parameter; mismatched frames silently corrupt both the yaw calculation and the goal position label sent to the navigation stack.
  • Unit tests cover happy paths and no-odom branches for all three skills, but the frame-mismatch scenario is not exercised.

Confidence Score: 3/5

Merging is risky until the frame-consistency assumption in rotate_toward_position is explicitly validated or documented.

The rotate_toward_position skill subtracts odometry coordinates from caller-supplied target coordinates and stamps the result with the caller's frame_id, with no check that both coordinate sets share the same reference frame. In a standard ROS deployment where /odom publishes in the "odom" frame and the navigation stack operates in "map", every call with the default frame_id="map" produces a geometrically incorrect heading and sends a misframed goal to the navigation stack — silently, with no error or warning.

dimos/agents/skills/navigation.py — specifically the rotate_toward_position implementation around line 298.

Important Files Changed

Filename Overview
dimos/agents/skills/navigation.py Adds three new navigation skills and a _make_position_goal helper; rotate_toward_position silently mixes coordinates from potentially different frames when computing yaw.
dimos/msgs/geometry_msgs/PoseStamped.py Adds agent_encode() returning a flat dict of rounded position and yaw values; straightforward and consistent with the __str__ formatting pattern already in place.
dimos/agents/skills/test_navigation.py Adds MockedPositionNavSkill harness and unit tests for all three new skills including no-odom branches; no test covers the frame-mismatch scenario in rotate_toward_position.
dimos/msgs/geometry_msgs/test_PoseStamped.py Adds two tests for agent_encode covering field presence and rounding behaviour; well-targeted and correct.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant MCP as MCP Server
    participant NS as NavigationSkillContainer
    participant Nav as NavigationInterfaceSpec

    Agent->>MCP: call current_pose()
    MCP->>NS: current_pose()
    NS->>NS: _latest_odom.agent_encode()
    NS-->>MCP: "Pose: (x, y, z) Yaw: d deg Frame: f"
    MCP-->>Agent: pose string

    Agent->>MCP: call navigate_to_position(x, y, yaw_deg, frame_id)
    MCP->>NS: navigate_to_position(x, y, yaw_deg, frame_id)
    NS->>NS: _make_position_goal(x, y, radians(yaw_deg), frame_id)
    NS->>Nav: set_goal(PoseStamped)
    NS-->>MCP: success string
    MCP-->>Agent: success string

    Agent->>MCP: call rotate_toward_position(x, y, frame_id)
    MCP->>NS: rotate_toward_position(x, y, frame_id)
    NS->>NS: read _latest_odom (cur_x, cur_y)
    NS->>NS: "yaw = atan2(y-cur_y, x-cur_x) frame unchecked"
    NS->>NS: _make_position_goal(cur_x, cur_y, yaw, frame_id)
    NS->>Nav: set_goal(PoseStamped)
    NS-->>MCP: success string
    MCP-->>Agent: success string
Loading

Reviews (1): Last reviewed commit: "feat(agents): agent-encode PoseStamped +..." | Re-trigger Greptile

Comment on lines +298 to +306
if self._latest_odom is None:
return "No odometry data received yet, cannot rotate."

cur_x = self._latest_odom.position.x
cur_y = self._latest_odom.position.y
yaw_rad = math.atan2(y - cur_y, x - cur_x)

goal = _make_position_goal(cur_x, cur_y, yaw_rad, frame_id)
return self._navigate_to(goal, f"Rotating to face ({x:.2f}, {y:.2f}) in {frame_id}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Frame mismatch silently corrupts yaw and goal position

yaw_rad = math.atan2(y - cur_y, x - cur_x) subtracts self._latest_odom.position (which may be in the "odom" frame) from (x, y) (which the caller asserts is in frame_id, defaulting to "map"). When the two frames differ — common in ROS setups where the /odom topic publishes in the "odom" frame while navigation uses "map" — the difference vector is geometrically meaningless, producing a wrong heading. Additionally, _make_position_goal(cur_x, cur_y, yaw_rad, frame_id) stamps odom-frame coordinates with frame_id, so the navigation stack receives a pose with a mismatched header frame. There is no guard or warning when self._latest_odom.frame_id != frame_id.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dimos, map↔world is published as an identity TF (dimos/navigation/rosnav.py:187-194), and existing skills follow the same implicit-frame convention — tag_location reads odom (frame "world") and stamps the result "map" (navigation.py:162-166), and _get_goal_pose_from_result hardcodes "map" (navigation.py:357). A frame_id string-equality check would actively break the default Go2 flow (odom publishes frame_id="world", default skill arg is "map") without catching any real geometric mismatch. Proper handling needs a TF-tree lookup applied uniformly across all nav skills — separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant