feat(agents): agent-encode PoseStamped + position nav skills by douglasswng · Pull Request #2000 · dimensionalOS/dimos

douglasswng · 2026-05-07T04:44:54Z

Problem

LLM agents can't act on raw PoseStamped data. When a skill or memory query returns a pose (e.g. memory2 returning the location of an interesting place), the agent has no way to render it for reasoning or hand it back to a navigation skill. This is the missing layer between perception/memory output and agent reasoning/action.

Closes #1693

Solution

Two pieces, designed so the encode output and skill inputs share the same keys (agent can pipe the dict straight through):

PoseStamped.agent_encode() — returns a flat dict (frame_id, x, y, z, yaw_deg). Position rounded to 3 dp, yaw to 1 dp degrees, so the agent sees human-scale numbers instead of float noise.
Three new skills on NavigationSkillContainer:
- navigate_to_position(x, y, yaw_deg, frame_id) — go to an absolute pose.
- rotate_toward_position(x, y, frame_id) — face a target without translating; yaw computed from current odom.
- current_pose() — return the robot's pose so the agent can reason about relative geometry (e.g. distance, bearing) without us baking that into the encode.

Also extracted a _make_position_goal helper now used by the three new skills and the existing _get_goal_pose_from_result.

Why absolute, not body-relative. MCP auto-renders skill returns via result.agent_encode() with no args (mcp_server.py:130-132). Making the output body-relative requires an observer somewhere — and since the call site is MCP, that means coupling a transport layer to live odometry (per-tool subscriptions, hidden state between skill return and agent input). Absolute coords + a current_pose skill avoids that: the agent composes the relative view itself, the rendering layer stays context-free.

Breaking Changes

None.

How to Test

uv run pytest dimos/msgs/geometry_msgs/test_PoseStamped.py \
              dimos/agents/skills/test_navigation.py -v

Covers agent_encode (fields + rounding), each new skill (happy path), and the no-odom branches of rotate_toward_position / current_pose.

Not tested end-to-end: I don't have an OpenAI API key or local GPU, so I couldn't run an agentic blueprint and watch the LLM actually invoke these skills. Unit tests cover the skill internals; the agent-facing contract (docstring → schema, return-string interpretability) is untested. Happy to coordinate on a smoke test — e.g., spin up unitree-go2-agentic (or any blueprint with McpClient) and prompt the agent to fetch its current pose and navigate to (1, 0).

Contributor License Agreement

I have read and approved the CLA.

greptile-apps · 2026-05-07T04:48:33Z

Greptile Summary

This PR introduces PoseStamped.agent_encode() — a flat dict representation designed for LLM consumption — and three new NavigationSkillContainer skills (navigate_to_position, rotate_toward_position, current_pose) that close the loop between memory/perception output and agent-driven navigation. A module-level _make_position_goal helper is extracted and reused across the new and existing goal-construction sites.

agent_encode() returns frame_id, x, y, z, and yaw_deg (position rounded to 3 dp, yaw to 1 dp), intentionally matching the keyword arguments of navigate_to_position so the agent can pipe the dict straight through.
rotate_toward_position computes heading from the robot's odometry to a target point, but does not validate that self._latest_odom.frame_id matches the frame_id parameter; mismatched frames silently corrupt both the yaw calculation and the goal position label sent to the navigation stack.
Unit tests cover happy paths and no-odom branches for all three skills, but the frame-mismatch scenario is not exercised.

Confidence Score: 3/5

Merging is risky until the frame-consistency assumption in rotate_toward_position is explicitly validated or documented.

The rotate_toward_position skill subtracts odometry coordinates from caller-supplied target coordinates and stamps the result with the caller's frame_id, with no check that both coordinate sets share the same reference frame. In a standard ROS deployment where /odom publishes in the "odom" frame and the navigation stack operates in "map", every call with the default frame_id="map" produces a geometrically incorrect heading and sends a misframed goal to the navigation stack — silently, with no error or warning.

dimos/agents/skills/navigation.py — specifically the rotate_toward_position implementation around line 298.

Important Files Changed

Filename	Overview
dimos/agents/skills/navigation.py	Adds three new navigation skills and a `_make_position_goal` helper; `rotate_toward_position` silently mixes coordinates from potentially different frames when computing yaw.
dimos/msgs/geometry_msgs/PoseStamped.py	Adds `agent_encode()` returning a flat dict of rounded position and yaw values; straightforward and consistent with the `__str__` formatting pattern already in place.
dimos/agents/skills/test_navigation.py	Adds `MockedPositionNavSkill` harness and unit tests for all three new skills including no-odom branches; no test covers the frame-mismatch scenario in `rotate_toward_position`.
dimos/msgs/geometry_msgs/test_PoseStamped.py	Adds two tests for `agent_encode` covering field presence and rounding behaviour; well-targeted and correct.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant MCP as MCP Server
    participant NS as NavigationSkillContainer
    participant Nav as NavigationInterfaceSpec

    Agent->>MCP: call current_pose()
    MCP->>NS: current_pose()
    NS->>NS: _latest_odom.agent_encode()
    NS-->>MCP: "Pose: (x, y, z) Yaw: d deg Frame: f"
    MCP-->>Agent: pose string

    Agent->>MCP: call navigate_to_position(x, y, yaw_deg, frame_id)
    MCP->>NS: navigate_to_position(x, y, yaw_deg, frame_id)
    NS->>NS: _make_position_goal(x, y, radians(yaw_deg), frame_id)
    NS->>Nav: set_goal(PoseStamped)
    NS-->>MCP: success string
    MCP-->>Agent: success string

    Agent->>MCP: call rotate_toward_position(x, y, frame_id)
    MCP->>NS: rotate_toward_position(x, y, frame_id)
    NS->>NS: read _latest_odom (cur_x, cur_y)
    NS->>NS: "yaw = atan2(y-cur_y, x-cur_x) frame unchecked"
    NS->>NS: _make_position_goal(cur_x, cur_y, yaw, frame_id)
    NS->>Nav: set_goal(PoseStamped)
    NS-->>MCP: success string
    MCP-->>Agent: success string

_{Reviews (1): Last reviewed commit: "feat(agents): agent-encode PoseStamped +..." | Re-trigger Greptile}

greptile-apps · 2026-05-07T04:48:37Z

+        if self._latest_odom is None:
+            return "No odometry data received yet, cannot rotate."
+
+        cur_x = self._latest_odom.position.x
+        cur_y = self._latest_odom.position.y
+        yaw_rad = math.atan2(y - cur_y, x - cur_x)
+
+        goal = _make_position_goal(cur_x, cur_y, yaw_rad, frame_id)
+        return self._navigate_to(goal, f"Rotating to face ({x:.2f}, {y:.2f}) in {frame_id}")


Frame mismatch silently corrupts yaw and goal position

yaw_rad = math.atan2(y - cur_y, x - cur_x) subtracts self._latest_odom.position (which may be in the "odom" frame) from (x, y) (which the caller asserts is in frame_id, defaulting to "map"). When the two frames differ — common in ROS setups where the /odom topic publishes in the "odom" frame while navigation uses "map" — the difference vector is geometrically meaningless, producing a wrong heading. Additionally, _make_position_goal(cur_x, cur_y, yaw_rad, frame_id) stamps odom-frame coordinates with frame_id, so the navigation stack receives a pose with a mismatched header frame. There is no guard or warning when self._latest_odom.frame_id != frame_id.

In dimos, map↔world is published as an identity TF (dimos/navigation/rosnav.py:187-194), and existing skills follow the same implicit-frame convention — tag_location reads odom (frame "world") and stamps the result "map" (navigation.py:162-166), and _get_goal_pose_from_result hardcodes "map" (navigation.py:357). A frame_id string-equality check would actively break the default Go2 flow (odom publishes frame_id="world", default skill arg is "map") without catching any real geometric mismatch. Proper handling needs a TF-tree lookup applied uniformly across all nav skills — separate PR.

feat(agents): agent-encode PoseStamped + position nav skills

d12e692

greptile-apps Bot reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agents): agent-encode PoseStamped + position nav skills#2000

feat(agents): agent-encode PoseStamped + position nav skills#2000
douglasswng wants to merge 1 commit intodimensionalOS:devfrom
douglasswng:feat/agent-encode-posestamped-1693

douglasswng commented May 7, 2026

Uh oh!

greptile-apps Bot commented May 7, 2026

Uh oh!

greptile-apps Bot May 7, 2026

Uh oh!

douglasswng May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

douglasswng commented May 7, 2026

Problem

Solution

Breaking Changes

How to Test

Contributor License Agreement

Uh oh!

greptile-apps Bot commented May 7, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

douglasswng May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant