feat(agents): agent-encode PoseStamped + position nav skills#2000
feat(agents): agent-encode PoseStamped + position nav skills#2000douglasswng wants to merge 1 commit intodimensionalOS:devfrom
Conversation
Greptile SummaryThis PR introduces
Confidence Score: 3/5Merging is risky until the frame-consistency assumption in The dimos/agents/skills/navigation.py — specifically the Important Files Changed
Sequence DiagramsequenceDiagram
participant Agent
participant MCP as MCP Server
participant NS as NavigationSkillContainer
participant Nav as NavigationInterfaceSpec
Agent->>MCP: call current_pose()
MCP->>NS: current_pose()
NS->>NS: _latest_odom.agent_encode()
NS-->>MCP: "Pose: (x, y, z) Yaw: d deg Frame: f"
MCP-->>Agent: pose string
Agent->>MCP: call navigate_to_position(x, y, yaw_deg, frame_id)
MCP->>NS: navigate_to_position(x, y, yaw_deg, frame_id)
NS->>NS: _make_position_goal(x, y, radians(yaw_deg), frame_id)
NS->>Nav: set_goal(PoseStamped)
NS-->>MCP: success string
MCP-->>Agent: success string
Agent->>MCP: call rotate_toward_position(x, y, frame_id)
MCP->>NS: rotate_toward_position(x, y, frame_id)
NS->>NS: read _latest_odom (cur_x, cur_y)
NS->>NS: "yaw = atan2(y-cur_y, x-cur_x) frame unchecked"
NS->>NS: _make_position_goal(cur_x, cur_y, yaw, frame_id)
NS->>Nav: set_goal(PoseStamped)
NS-->>MCP: success string
MCP-->>Agent: success string
Reviews (1): Last reviewed commit: "feat(agents): agent-encode PoseStamped +..." | Re-trigger Greptile |
| if self._latest_odom is None: | ||
| return "No odometry data received yet, cannot rotate." | ||
|
|
||
| cur_x = self._latest_odom.position.x | ||
| cur_y = self._latest_odom.position.y | ||
| yaw_rad = math.atan2(y - cur_y, x - cur_x) | ||
|
|
||
| goal = _make_position_goal(cur_x, cur_y, yaw_rad, frame_id) | ||
| return self._navigate_to(goal, f"Rotating to face ({x:.2f}, {y:.2f}) in {frame_id}") |
There was a problem hiding this comment.
Frame mismatch silently corrupts yaw and goal position
yaw_rad = math.atan2(y - cur_y, x - cur_x) subtracts self._latest_odom.position (which may be in the "odom" frame) from (x, y) (which the caller asserts is in frame_id, defaulting to "map"). When the two frames differ — common in ROS setups where the /odom topic publishes in the "odom" frame while navigation uses "map" — the difference vector is geometrically meaningless, producing a wrong heading. Additionally, _make_position_goal(cur_x, cur_y, yaw_rad, frame_id) stamps odom-frame coordinates with frame_id, so the navigation stack receives a pose with a mismatched header frame. There is no guard or warning when self._latest_odom.frame_id != frame_id.
There was a problem hiding this comment.
In dimos, map↔world is published as an identity TF (dimos/navigation/rosnav.py:187-194), and existing skills follow the same implicit-frame convention — tag_location reads odom (frame "world") and stamps the result "map" (navigation.py:162-166), and _get_goal_pose_from_result hardcodes "map" (navigation.py:357). A frame_id string-equality check would actively break the default Go2 flow (odom publishes frame_id="world", default skill arg is "map") without catching any real geometric mismatch. Proper handling needs a TF-tree lookup applied uniformly across all nav skills — separate PR.
Problem
LLM agents can't act on raw
PoseStampeddata. When a skill or memory query returns a pose (e.g. memory2 returning the location of an interesting place), the agent has no way to render it for reasoning or hand it back to a navigation skill. This is the missing layer between perception/memory output and agent reasoning/action.Closes #1693
Solution
Two pieces, designed so the encode output and skill inputs share the same keys (agent can pipe the dict straight through):
PoseStamped.agent_encode()— returns a flat dict (frame_id,x,y,z,yaw_deg). Position rounded to 3 dp, yaw to 1 dp degrees, so the agent sees human-scale numbers instead of float noise.Three new skills on
NavigationSkillContainer:navigate_to_position(x, y, yaw_deg, frame_id)— go to an absolute pose.rotate_toward_position(x, y, frame_id)— face a target without translating; yaw computed from current odom.current_pose()— return the robot's pose so the agent can reason about relative geometry (e.g. distance, bearing) without us baking that into the encode.Also extracted a
_make_position_goalhelper now used by the three new skills and the existing_get_goal_pose_from_result.Why absolute, not body-relative. MCP auto-renders skill returns via
result.agent_encode()with no args (mcp_server.py:130-132). Making the output body-relative requires anobserversomewhere — and since the call site is MCP, that means coupling a transport layer to live odometry (per-tool subscriptions, hidden state between skill return and agent input). Absolute coords + acurrent_poseskill avoids that: the agent composes the relative view itself, the rendering layer stays context-free.Breaking Changes
None.
How to Test
uv run pytest dimos/msgs/geometry_msgs/test_PoseStamped.py \ dimos/agents/skills/test_navigation.py -vCovers
agent_encode(fields + rounding), each new skill (happy path), and the no-odom branches ofrotate_toward_position/current_pose.Not tested end-to-end: I don't have an OpenAI API key or local GPU, so I couldn't run an agentic blueprint and watch the LLM actually invoke these skills. Unit tests cover the skill internals; the agent-facing contract (docstring → schema, return-string interpretability) is untested. Happy to coordinate on a smoke test — e.g., spin up
unitree-go2-agentic(or any blueprint withMcpClient) and prompt the agent to fetch its current pose and navigate to (1, 0).Contributor License Agreement