Skip to content

Fix topic statistics for IPC subscriptions#3130

Merged
jmachowinski merged 5 commits intoros2:rollingfrom
botsandus:fix-topic-stats-when-using-ipc
Apr 20, 2026
Merged

Fix topic statistics for IPC subscriptions#3130
jmachowinski merged 5 commits intoros2:rollingfrom
botsandus:fix-topic-stats-when-using-ipc

Conversation

@jayyoung
Copy link
Copy Markdown
Contributor

Continuing on from the discussion here on the original fix: #2913 (comment) which addresses #2911 we are opening this PR at @fujitatomoya request.

Description

The root issue: We discovered that when use_intra_process_comms is enabled, messages are delivered via SubscriptionIntraProcess::execute_impl() which previously never called the topic statistics handler. This causes all statistics to report NaN values, making it impossible for us distinguish a healthy IPC subscription from an unhealthy one.

This fix: Here we add a type-erased StatsHandlerFn function to SubscriptionIntraProcess and connect it up from the Subscription via a lambda. Using the lambda rather than a pointer to SubscriptionTopicStatistics avoids a circular include chain we discovered that could occur if subscription_topic_statistics.hpp was included directly from subscription_intra_process.hpp.

In the code, we set source_timestamp to the receive time so that message_age reports 0ms rather than an un-initialised value. Our reasoning is that: IPC delivery has no or little transport latency, so near-zero age is OK and expected. As discussed in the original fix PR, there is a one-time warning emit to inform users that message_age is not meaningful for IPC subscriptions.

IN summary this PR is a revised implementation of the original PR #2913 by @roman-y-wu, with the following differences:

  • Uses std::function type erasure to avoid the circular include issue we found
  • Sets source_timestamp to prevent message_age reporting an invalid large value (uninitialised timestamp)

From the original PR, we kept the one-time RCLCPP_WARN_ONCE log message to ensures users are not silently surprised by message_age reading ~0ms when using IPC subscriptions.

Fixes #2911

Is this user-facing behavior change?

Yeah it is for us, and addresses an issue we find in production. With this fix topic statistics now report valid values for subscriptions using IPC. Previously all statistics were NaN when IPC was enabled, and this behaviour bit us.

Did you use Generative AI?

Not on our side. This PR is a small adaptation of a older fix that already existed: #2913

Additional Information

This fix was first validated as a backport onto ROS 2 Kilted (rclcpp == 29.5.6) and has been running in production at Dexory across a fleet of 126 autonomous warehouse robots for over one month with no issues. Our robots use IPC-enabled lidar pipelines and rely on topic stats to monitor sensor health. message_period statistics are now correct and our monitoring correctly detects legitimate sensor failures with this fix.

Comment thread rclcpp/include/rclcpp/experimental/subscription_intra_process.hpp Outdated
Comment thread rclcpp/include/rclcpp/experimental/subscription_intra_process.hpp
@tonynajjar
Copy link
Copy Markdown
Contributor

Could you also add a test that verifies that with IPC enabled the stats handler fires and reports a non-nan age?

@tonynajjar
Copy link
Copy Markdown
Contributor

@fujitatomoya we'd love to get this merged in before the freeze of rclcpp, I believe on the 20th April. Anything else you would like modified?

If use_intra_process_comms is enabled, messages are delivered via
SubscriptionIntraProcess::execute_impl() which previously never called
the topic statistics handler, this causes all statistics to report NaN values.

The fix here puts a type-erased StatsHandlerFn in SubscriptionIntraProcess
and wires it up from the Subscription via a  lambda. Using std::function avoids
a circular include chain that would occur if subscription_topic_statistics.hpp
were included directly from subscription_intra_process.hpp via
publisher.hpp/callback_group.hpp.

The source_timestamp is set to the receive time so that message_age reports
0ms rather than an un-initialised value. IPC delivery has little/no expected transport
latency, by definition so near-zero age is expected.

Fixes ros2#2911

Signed-off-by: jayyoung <jay.young@gmail.com>
…drop now variable.

Signed-off-by: jayyoung <jay.young@gmail.com>
Signed-off-by: jayyoung <jay.young@gmail.com>
collected when use_intra_process_comms=true.

Signed-off-by: jayyoung <jay.young@gmail.com>
@jayyoung jayyoung force-pushed the fix-topic-stats-when-using-ipc branch from 7df6693 to cdf6632 Compare April 16, 2026 14:47
@jayyoung
Copy link
Copy Markdown
Contributor Author

Could you also add a test that verifies that with IPC enabled the stats handler fires and reports a non-nan age?

Added a little one cdf6632

Copy link
Copy Markdown
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, test would be ideal.

Comment thread rclcpp/include/rclcpp/experimental/subscription_intra_process.hpp
@tonynajjar
Copy link
Copy Markdown
Contributor

@fujitatomoya a test was added, anything else to test?

@fujitatomoya
Copy link
Copy Markdown
Collaborator

Pulls: #3130
Gist: https://gist.githubusercontent.com/fujitatomoya/72662fc102cdd964a64583c3201db0d6/raw/d39d13ab88b54886116348d2fb6418264a592531/ros2.repos
BUILD args: --packages-above-and-dependencies rclcpp
TEST args: --packages-above rclcpp
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/18997

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

Signed-off-by: jayyoung <jay.young@gmail.com>
@jayyoung
Copy link
Copy Markdown
Contributor Author

jayyoung commented Apr 20, 2026

just added 175ce2f which fixes the linting failure @fujitatomoya, thanks! can you re-trigger CI please?

@tonynajjar
Copy link
Copy Markdown
Contributor

tonynajjar commented Apr 20, 2026

@jmachowinski I'm asking you since you are roughly in our timezone, could you please relaunch CI so we can try to get this in before the freeze? Thanks!

@jmachowinski
Copy link
Copy Markdown
Collaborator

Pulls: #3130
Gist: https://gist.githubusercontent.com/jmachowinski/a744ece36c05d299e61f7521a2fcb409/raw/d39d13ab88b54886116348d2fb6418264a592531/ros2.repos
BUILD args:
TEST args:
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/19012

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@jmachowinski
Copy link
Copy Markdown
Collaborator

@tonynajjar I don't remember Munich to be in a different time zone than Bremen ;-P

@tonynajjar
Copy link
Copy Markdown
Contributor

tonynajjar commented Apr 20, 2026

I was including my colleague in the UK in that sentence 😆 Thank you!

@jmachowinski
Copy link
Copy Markdown
Collaborator

@tonynajjar @jayyoung Just ignore the windows build error for now, we are cleaning up the build failures from the MSVC bump.

@sloretz
Copy link
Copy Markdown
Contributor

sloretz commented Apr 20, 2026

Windows rerun - no connext, building up to rclcpp, testing just rclcpp Build Status

@jmachowinski jmachowinski merged commit 59b7c5f into ros2:rolling Apr 20, 2026
3 of 4 checks passed
@tonynajjar
Copy link
Copy Markdown
Contributor

Thank you for the push everyone! Happy freeze

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

use_intra_process_comms bypasses topic statistics computation

5 participants