When capability-based aggregation matching is used (i.e. no query_configs entry exists for a query), find_compatible_aggregation selects an aggregation based on metric, statistic type, window size, labels, and spatial filter — but it does not check how many historical windows the selected aggregation is actually retaining.
CleanupPolicy and num_aggregates_to_retain live on AggregationReference (part of the query_configs path), not on AggregationConfig. Since capability matching only inspects AggregationConfig, it has no visibility into retention.
Failure example
Suppose a CircularBuffer aggregation is configured to retain 3 windows of 5 minutes each (15 minutes of history). A query arrives for sum_over_time(cpu[30m]) (6 × 5-minute windows). Capability matching sees:
- metric:
cpu ✓
- type:
Sum ✓
- window: 300s tumbling, and 1,800,000 ms is divisible by 300,000 ms ✓
- labels and spatial filter: match ✓
So it routes to this aggregation. At execution time, the store only has 3 windows, not 6. The query returns a partial or incorrect result with no routing-level error — the failure is silent.
The query_configs path avoids this because num_aggregates_to_retain is set explicitly per AggregationReference, giving the operator direct control over how far back each query can reach.
Proposed fix
Before committing to a candidate aggregation, check that the store holds at least ceil(data_range_ms / window_size_ms) recent windows for it. This requires either:
- passing a store handle into
find_compatible_aggregation, or
- exposing a
retained_window_count(aggregation_id) query on the store and calling it from the engine after capability matching selects a candidate.
When capability-based aggregation matching is used (i.e. no
query_configsentry exists for a query),find_compatible_aggregationselects an aggregation based on metric, statistic type, window size, labels, and spatial filter — but it does not check how many historical windows the selected aggregation is actually retaining.CleanupPolicyandnum_aggregates_to_retainlive onAggregationReference(part of thequery_configspath), not onAggregationConfig. Since capability matching only inspectsAggregationConfig, it has no visibility into retention.Failure example
Suppose a
CircularBufferaggregation is configured to retain 3 windows of 5 minutes each (15 minutes of history). A query arrives forsum_over_time(cpu[30m])(6 × 5-minute windows). Capability matching sees:cpu✓Sum✓So it routes to this aggregation. At execution time, the store only has 3 windows, not 6. The query returns a partial or incorrect result with no routing-level error — the failure is silent.
The
query_configspath avoids this becausenum_aggregates_to_retainis set explicitly perAggregationReference, giving the operator direct control over how far back each query can reach.Proposed fix
Before committing to a candidate aggregation, check that the store holds at least
ceil(data_range_ms / window_size_ms)recent windows for it. This requires either:find_compatible_aggregation, orretained_window_count(aggregation_id)query on the store and calling it from the engine after capability matching selects a candidate.