Commit Graph

1 Commits

Author SHA1 Message Date
marc bfff0eeec0 Ingest Dashboard: 5min TTL cache + days->hours normalisation
Dashboard reloads on multi-day windows could take 30-60s and sometimes
returned HTTP 502 ("internal Scalyr error") when the SDL window was
expressed in days. Two-part fix:

1. In-process async TTL cache (services/async_cache.py)
   - 5 min TTL on top-sources, by-event-type, daily-volume.
   - Single-flight lock per cache key (no thundering herd).
   - Optional ?nocache=1 query param to force a refresh.
   - New endpoints: GET /api/ingest/cache-stats, DELETE /api/ingest/cache.

2. Normalise days -> hours upstream of the PowerQuery
   - SDL is unstable on day-scale windows for large group-by counts on
     busy tenants but stable on the equivalent hour-scale window.
   - top-sources?days=1 used to 502; now works.

Observed timings on a busy tenant:
  top-sources?days=7  cold ~55s -> warm ~13ms (~4300x)
  top-sources?days=1  was 502   -> ~4ms (cold) / ~1.4ms (warm)
2026-05-22 21:36:42 +02:00