Skip to content

feat: precompute job registration API and sketch query executor for e2e datapath #237

@zzylol

Description

@zzylol

Context

This issue tracks everything ASAPQuery needs to implement so the e2e precomputation data path works.


Required API surface

POST /api/v1/precompute/jobs

Register a new precompute job. The controller sends:

{
  "query":       "quantile_over_time(0.99, latency{service=\"web\"}[5m])",
  "granularity": "1m",
  "source":      "backend-collector:4317",
  "sketch_type": "ddsketch",
  "store_path":  "precomputed/latency/p99/5m"
}

Response:

{
  "job_id":     "job-abc123",
  "status":     "created",
  "created_at": "2026-03-26T00:00:00Z"
}

DELETE /api/v1/precompute/jobs/{id}

Deregister and stop a job. Called by the controller on re-plan and rollback. Must return 204 No Content on success.

GET /api/v1/precompute/jobs

List active jobs and their status. Used for operator visibility and health checks.


Required internal components

Component Responsibility
Job registry Persist active jobs (survive restart); expose GET /jobs for status
Sketch puller On each granularity tick, fetch accumulated sketch blobs from source (edge collector address)
Query executor Evaluate query_expr using sketch-native functions (table below)
Result cache Store scalar/vector result keyed by store_path; serve zero-latency on cache hit
Scheduler Trigger each job at its granularity interval; honour valid_until TTL on re-plan
Query router When a user PromQL/SQL query matches a registered store_path, short-circuit to cache instead of DB scan
Metrics endpoint Expose /metrics (Prometheus) so the controller can observe job health and feed it back into planning decisions

Query executor — sketch function mapping

The query_expr strings sent by the controller use these function names:

Function Sketch type Semantics
quantile_over_time(φ, metric[window]) DDSketch Query DDSketch for quantile φ over the window
count_distinct_over_time(metric[window]) HLL Return HLL cardinality estimate for the window
top_k_over_time(k, metric[window]) CountSketch / CountMinSketch Return top-k heavy hitters from the sketch

Feedback metrics (Prometheus /metrics)

The controller scrapes these to observe actual precompute performance and adjust future planning decisions:

Metric name Type Description
asapquery_precompute_job_duration_seconds Histogram Wall time per precompute job cycle
asapquery_precompute_cache_hits_total Counter Queries served from precomputed cache
asapquery_precompute_cache_misses_total Counter Queries that fell through to DB scan
asapquery_precompute_sketch_pull_errors_total Counter Failed sketch fetches from source

Acceptance criteria

  • POST /api/v1/precompute/jobs accepts the request body and returns job_id
  • DELETE /api/v1/precompute/jobs/{id} deregisters a job and stops its scheduler
  • Registered jobs execute at granularity and cache results under store_path
  • quantile_over_time, count_distinct_over_time, top_k_over_time resolve against sketch data from source
  • Matching user queries are served from cache (zero DB scan)
  • /metrics exposes the four feedback metrics listed above
  • Integration test: start ASAPQuery with a controller assigning a job where repeat_every < latency_sla; assert the job is registered, executed, and the result is served from cache on the next matching query

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions