feat: Redis cache + pre-warm for dashboard summary endpoints (Phase C of #20)#28
Conversation
Five new endpoints for the remaining dashboard charts that fetch raw data:
- GET /v1/conferences/duration-summary
Returns conference counts bucketed by duration range (< 1m, 1-3m, etc.)
- GET /v1/conferences/participant-count-summary
Returns distribution of conferences by participant count
- GET /v1/issues/summary
Returns issue counts grouped by code with titles
- GET /v1/issues/gum-summary
Returns getusermedia_error issue counts grouped by error name
Also adds three new filter params to /v1/conferences for click-to-detail
modals on these charts:
- duration_gte, duration_lt (for duration chart)
- issue_code (for most-common-issues chart)
All endpoints accept appId, created_at_gte, created_at_lte and handle
both Python native ISO format and JavaScript's toISOString Z suffix.
Phases 2 and 3 of peermetrics#20 — eliminates the need for the dashboard to
download all conferences (~38MB) and all issues (~73MB).
…ermetrics#20) Adds three new aggregation endpoints that let the dashboard stop downloading full /connections and /sessions payloads to build charts client-side: - GET /v1/connections/summary — relay vs direct connection counts (replaces the Relayed-connections pie chart's client-side reduce) - GET /v1/connections/setup-time-summary — connection setup-time buckets with per-bucket conference_ids for click-to-detail - GET /v1/sessions/summary — browsers, OS, country, and city/geo aggregates (powers Browsers, OS, and Map charts in one roundtrip) Also accepts `conference_ids=a,b,c` on /conferences so the setup-time chart can page through matched conferences on click. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…se C of peermetrics#20) With Phases 0-5 merged, every dashboard chart reads from a server-side aggregation endpoint. The SQL is fast with indexes, but the same ~8 queries run on every page load, and the heavy ones (sessions.summary, connections.setup_time_summary) still cost 400-800ms on a live tenant. Adds a thin caching layer in front of each summary view: - `app/summary_cache.py` — `cached_json(endpoint, request, compute)` hashes (endpoint + filter params) into a short key, reads Redis, falls through to `compute()` on miss, and writes back with a 60s TTL. Redis failures are tolerated (settings already has IGNORE_EXCEPTIONS). - Each of the eight summary views moves its existing compute body into a local `compute()` closure and returns through the helper. No change to the JSON shape, query logic, or error handling. - `manage.py prewarm_summaries` — scheduled command that iterates apps with recent traffic (default: any conference in the last 2 days) and runs every summary view with the 30d-window filters the dashboard sends by default. Intended to run every ~30s as an ECS scheduled task so first visitors never see a cold miss. Measured locally against a 7-day Production clone (~18k conferences / 38k sessions / 38k connections): endpoint cold warm conferences/summary 391ms → 12ms (33x) sessions/summary 748ms → 11ms (68x) connections/setup_time_summary 373ms → 11ms (34x) conferences/participant_count_summary 216ms → 7ms (31x) issues/gum_summary 107ms → 6ms (18x) connections/summary 57ms → 6ms (9.5x) issues/summary 45ms → 86ms (noise; both <100ms) conferences/duration_summary 19ms → 8ms (2.3x) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Some additional feedback: P1 — real bug: GET /v1/conferences?issue_code=... can return the same conference multiple times when several issues share that code, which breaks pagination and count for dashboard drilldowns (each page should be unique conferences, aligned with aggregated chart semantics). P2 — policy / correctness: issue_code should not match soft-deleted issues. Per existing BaseModel / API patterns, issue_code should only consider active issues → add issues__is_active=True whenever issue_code is applied. P2/P3 — hardening: GET /v1/issues/gum-summary walks Issue.data with .get(). If data is not a dict (null is fine; bad legacy JSON is not), the view can 500. Skip non-dict rows and keep aggregating the rest. Suggested fix direction:
Follow-ups: regression tests for the three bullets above; one-line README under private /conferences for issue_code + “one row per conference.” No migrations required (behavior-only). |
|
Filtering conferences by issue code joins issues, so one conference can show up many times (e.g: camera issue happening 5 times would be counted 5 times and repeated in dashboard) and mess up pagination/counts. Deduplicate and only match active issues, like elsewhere. The GUM chart reads Issue.data as a dict; if a row isn’t, the handler can crash—skip those rows better than failing the whole request. |
- Unit-test cache key rules, hit/miss, TTL override, and soft-fail on get/set errors. - Smoke-test prewarm_summaries for zero apps and one recent app (8 views). Made-with: Cursor
a16d91f to
e7e51f7
Compare
|
Ready for one last run @Boanerges1996, if you confirm it is still working for you we can merge all changes here and previous PRs we used as base here. |
Summary
Stacked on top of #26 and #27 — PR diff will show all Phase 2-5 + Phase C commits until those upstream PRs merge, then rebase down to just the cache commit (`6b2a3a6`).
Phases 0-5 moved dashboard aggregation to SQL. Phase C caches the results: one Redis entry per (endpoint + filter params), 60s TTL, pre-warmed so first visitors don't pay cold-query cost.
Local benchmark (7-day Production clone, ~18k conferences / 38k sessions / 38k connections)
Total warm dashboard cost ≈ 150ms across all 8 endpoints (vs ~2s cold).
Test plan
Follow-ups (not in this PR)
🤖 Generated with Claude Code