Adds an asyncio background task that re-runs the heavy Ingest Dashboard
queries every ~4 min (just under the 5 min TTL) so the in-process cache
is always populated. First user hit on any dashboard widget then returns
from cache (single-digit ms) instead of waiting 30-60s for SDL.
Components:
- backend/services/prewarmer.py: standalone module, opt-in via
INGEST_PREWARM=1; configurable windows via INGEST_PREWARM_HOURS /
INGEST_PREWARM_DAYS / INGEST_PREWARM_DAILY_VOLUME_DAYS and interval
via INGEST_PREWARM_INTERVAL_SECONDS. Logs through the uvicorn logger
so cycles are visible in 'docker logs'.
- backend/main.py: spawn the task on FastAPI startup.
- docker-compose.yml: forward INGEST_PREWARM* env vars to the
backend service (default off).
Measured on Purple AI tenant (INGEMeasured on Purple AI tenant (INGEMeasured on Purple fMeasured on Purple AI tenant (INGEMeasured on Purple AI tenant (INGEMeasured on (INGEST_PREWARM=0) so non-opt-in
users see no behaviour change.
Dashboard reloads on multi-day windows could take 30-60s and sometimes
returned HTTP 502 ('internal Scalyr error') when the SDL window was
expressed in days. Two-part fix:
1. In-process async TTL cache (services/async_cache.py)
- 5 min TTL on top-sources, by-event-type, daily-volume.
- Single-flight lock per cache key (no thundering herd).
- Optional ?nocache=1 query param to force a refresh.
- New endpoints: GET /api/ingest/cache-stats, DELETE /api/ingest/cache.
2. Normalise days -> hours upstream of the PowerQuery
- SDL is unstable on day-scale windows for large group-by counts on
this tenant but stable on the equivalent hour-scale window.
- top-sources?days=1 used to 502; now works.
Measured on Purple AI tenant:
top-sources?days=7 cold 55.7s -> warm 13ms (~4300x)
t t t t t t t t t -> 4ms (cold) / 1.4ms (warm)
STAR rules sometimes label tactics with non-canonical names (e.g. 'Stealth',
'Defense Impairment') which were counted as distinct tactics on top of the
14 canonical ATT&CK Enterprise ones, producing percentages > 100%
(observed 15/14 = 107.1% on Purple AI tenant).
Fix in get_health_score():
- Restrict covered_tactics to the 14 canonical ATT&CK Enterprise tactics.
- Map known STAR aliases ('Stealth', 'Defense Impairment') -> 'Defense Evasion'.
- Derive TOTAL_TACTICS from the canonical set (single source of truth).
Result: tactics_covered = 14, mitre_pct = 100.0 (was 15 / 107.1).
SDL /logParsers/ also returns UEBA analytics tables, saved searches and
dashboard configs. They're not valid Test Runner inputs and pollute the
dropdown. Filter list_parser_files in two tiers:
1) Name denylist (ueba_*, searches, *_baselines_*, *_features_*,
*_scores_*, bsi-*, *-overview, smoke/test tables).
2) Content scan: file must contain attributes:/patterns:/formats:/
patternRefs:/rewrites:/parser: in first 4 KB.
Result: 97 files -> 41 real parsers, 0 false pos/neg.
Before re-creating ActiveSource rows, snapshot existing parser_detected
values. When writing new rows, take max(new, previous) so a source that
was once confirmed as parsed (event.type present in the data lake) never
loses its Covered status due to a sampling gap, partial query result, or
SDL PowerQuery timeout during Sync All.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extract product label from rule data_sources in coverage.py via new
_product_from_data_sources() helper (prefers non-SentinelOne entries
so product-specific rules get a meaningful label)
- Coverage Map detections column: rules now grouped by product with
collapsible chevron headers showing fired/silent counts
- Threat Coverage Rule Firing Status: collapsible product group headers
with active/silent summary; shows all 2066 rules across 30 products
- Threat Coverage Dependency Map: collapsible product groups, at-risk
products sorted first with risk count in header
- Ingest Dashboard: fix source name truncation — table cells now wrap
with break-all and title tooltip; bar chart labels extended to 16
chars with ellipsis and full-name tooltip on hover
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each card shows tactic name, technique count, and rule badge in the header.
Clicking the header toggles the technique chips with an animated chevron.
The existing '+N more' expander still works within the expanded card.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sources without detection rules no longer show stages 5-6 as failures:
- Backend: has_detection_rules flag added per source; progress (pct) calculated
over 4 core stages for sources with no rules; detection stages marked na:true
- Frontend: pipeline splits into two sections —
'With Detection Coverage' (6-stage, full pipeline)
'Parser Only' (4-stage, stages 5-6 shown as — N/A)
Each section has its own Show/Hide completed toggle
- Collapsed by default; Show Pipeline toggle reveals both sections
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Shows summary stats (Fully Onboarded / In Progress / Not Started) immediately
on page load; table is hidden until user clicks 'Show Pipeline'. Keeps the
Onboarding page scannable without scrolling past a large table to reach the
prompt template.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MITRE fix:
- S1 platform-rules API returns rule["mitre"] = [{tactic, techniques:[{id,title}]}]
not the flat field names we were checking — updated _extract_mitre to handle
this as the primary path, keeping flat field fallback for STAR rules
- generatedAlerts field on each platform rule stored in raw JSON during import
Firing status fix:
- sync-rule-firing now reads generatedAlerts from ParsedRule.raw as fast path
(instant, no SDL PowerQuery needed) since it's returned directly by the
platform-rules API on every library sync
- SDL PowerQuery retained as fallback for rules imported from detections.json
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MITRE ATT&CK heatmap:
- _extract_mitre() helper extracts tactics/techniques from S1 API rules
handling multiple field name conventions (tactic, mitreTechniques, etc.)
- _import_from_api_rules and _import_detections now store tactics/techniques
in raw JSON alongside data_sources
- GET /api/coverage/mitre returns tactic/technique breakdown ordered by
ATT&CK kill chain with coverage stats
- New "Threat Coverage" tab in frontend: stat cards (total rules, MITRE
mapped, tactics covered, techniques covered), tactic cards grid with
left-border color coding and technique chips with "+N more" expander
Detection rule firing status:
- RuleFiringCache table tracks alert_count per rule_name
- POST /api/coverage/sync-rule-firing queries SDL PowerQuery with 3
field-name patterns to find rule firing data; upserts into cache
- GET /api/coverage/rule-firing-cache returns cache sorted by alert count
- /map now includes alert_count per rule and firing_cache_populated flag
- Coverage map Detections column: when cache populated, shows alert count
in green or ⚠ amber for rules that have never fired
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- s1_client: configurable PowerQuery timeout via SDL_PQ_TIMEOUT env var
(default 600s, was hardcoded 120s) with separate connect/read timeouts
via httpx.Timeout; retry on ReadTimeout via SDL_PQ_TIMEOUT_RETRIES;
better error messages include query snippet and parse non-JSON responses
- ingest: fix simulate-filter SDL syntax (== → =, drop leading | on base
expression, surface PowerQuery error field, cleaner empty-filter fallback)
- docker-compose: pass SDL_PQ_TIMEOUT and SDL_PQ_TIMEOUT_RETRIES through
to backend container with sensible defaults
Not taken from PR #2:
- .gitignore parsers/* change — would untrack the 7 committed parser files
- s1_client/quality/coverage changes already present in main from prior work
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Key changes:
- Unlabelled event banner: shows count only after Sample Events is clicked; uses broad SDL filter expression; time window synced to sync-days dropdown
- Parser Quality: new "Attributes Missing" subsection listing all parsers without dataSource.name regardless of event volume
- Coverage map: filter buttons (All / Complete Parser / Attributes Missing); stat card renamed to "Incomplete Parser"; stub count excluded from sync when no active sources
- Sync All button: runs SDL parser sync → library sync → live sources sync in sequence
- Reset now clears ActiveSource table and resets unlabelled count cache
- run_powerquery: configurable max_count param (default 1000, 50M for count queries)
- _DS_NAME_RE: supports both quoted and unquoted dataSource.name keys in parser files
- Full modern UI redesign: slate palette, gradient cards, ring borders, pill nav, colored stat accents
- Updated 7 tracked parser files synced from SDL
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fetch detection library rules from platform-rules API at startup (falls
back to extracted.json); adds Sync Detection Library button for refresh
- Parser column simplified to ✓ Parsed / ✗ Not Parsed
- Detection counts now use library rules only (exclude custom STAR rules)
- Add close-match suggestions for dataSource.name mismatches (e.g. CloudTrail
→ AWS CloudTrail, Microsoft 365 Collaboration → Microsoft O365)
- Exclude SentinelOne Ranger AD from coverage map (native S1 source)
- Add success feedback banners to Load SDL Parsers and Sync Library buttons
- Remove rule_counts.json manual override; extracted.json is source of truth
- Remove Load Detections button; rules auto-import on backend startup
- Add get_account_id() and get_platform_rules() to s1_client
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Coverage Map:
- New "Detection Fields Missing" column shows dotted-path SDL fields that
associated STAR rules reference but the parser does not provide
- Only dotted field paths (src.ip, winEventLog.channel) are considered;
single-word correlation variables and metadata tokens are excluded
- Schema fields always present in events (dataSource.name, event.type etc)
are excluded from the missing list
Settings:
- New STAR_LIBRARY_ONLY field (select: true/false) controls whether
Load Library STAR Rules filters to @sentinelone.com creators or loads all
- Rendered as a dropdown in the Settings form with a hint description
- saveSettings now always persists select field values (not just non-empty)
- load-star-rules reads STAR_LIBRARY_ONLY env var as its default
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
load-star-rules now defaults to library_only=true, filtering rules where
the creator email ends in @sentinelone.com. Custom tenant rules are excluded
by default. Pass ?library_only=false to load all rules.
Button label updated to "Load Library STAR Rules" to make intent clear.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Filters are now: All | Custom Parser | Default Parser Only | No Parser
- Custom Parser: covered sources with a loaded SDL parser file
- Default Parser Only: covered via event.type detection in data lake
but no custom parser file — built-in or cloud-managed parser running
- No Parser: parser_needed sources (no parser found at all)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers setup, architecture, all five pages (Coverage Map, Ingest Dashboard,
Parser Quality, Onboarding, Settings), expected results for each tool,
rebuild commands, and project layout.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Selecting a source triggers a 20-event sample; actual field names from the
log are merged with SDL schema defaults (log fields first) and pre-filled
into the fields input. Falls back to SDL defaults if no events found.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Message column is pinned last, shows 80 chars with tooltip for the full
value, and has a ⎘ copy button that flashes ✓ on success. Other field
cells are unchanged.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Live Event Sampler and Field Population Rate now load sources from the
coverage map on page render instead of free-text inputs. Sources are sorted
by event count (busiest first) and show event totals. Falls back to a hint
message if no sources have been synced yet.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- sync-sources now runs a parallel PowerQuery checking for event.type
population per source; count stored in new active_sources.parser_detected
- Coverage map marks a source as covered if parser_detected > 0, even
without a matching local parser file (handles built-in/cloud parsers)
- UI parser cell shows "Parsed (N typed events detected)" for data-lake-
detected parsers vs named local parser files
- Runtime ALTER TABLE migration adds parser_detected column to existing DBs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New /api/quality router with three endpoints:
sample-events: pull raw events from a source via PowerQuery
field-population: measure % of events with each SDL field populated;
surfaces dataSource.name correctly (100% when filtered by it) and
returns fields_seen_in_sample so you can see what IS being extracted
test-parser: converts SDL \$field=pattern\$ format strings to Python
named-group regex and tests against a pasted raw log line
- New "Parser Quality" nav item and page with all three tools
- Home page card added for Parser Quality
- Field population UI shows per-field colour-coded progress bars plus
a chip list of fields actually present in the sample
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the 1h time filter is active the volume chart now renders the top-sources
data as a by-source bar chart (up to 12 sources) with the gradient fill and a
"Events by Source (Last 1h)" heading. Chart labels are auto-detected as dates
or source names so truncation is applied correctly for both modes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Coverage map: replace filename fuzzy-match with exact dataSource.name
lookup read directly from parser file attributes; grok/dottedJson parsers
now flagged as "parser_needed" with format type shown in the UI
- Bar chart: SVG linearGradient (light purple → deep violet) replaces flat fill
- Ingest dashboard: add 1h button (first option) backed by new backend
hours= query param on /api/ingest/top-sources; daily-volume chart shows
informational message when in 1h mode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously showed field-level coverage (rule fields vs parser fields).
Now shows per-dataSource.name coverage: is a parser loaded for each
active ingest source?
- New ActiveSource DB model stores live sources from SDL
- New POST /api/coverage/sync-sources endpoint runs PowerQuery to fetch
current dataSource.names and their event counts, stores in DB
- GET /api/coverage/map now returns per-source status:
covered = a loaded parser matches this source name
parser_needed = source is ingesting but no parser is loaded
- Parser matching uses fuzzy substring (handles "palo"→"Palo Alto Networks Firewall")
- Coverage table shows: source name, 7d event count, status, matched parser + field count, STAR rules
- Frontend: new "Sync Live Sources" button, updated stats cards, updated filter tabs
- Removed field-level view (was confusing — parser_needed on a field ≠ missing parser for a source)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add event count label on top of each bar (e.g. 220 or 1.2k)
- Add Y-axis grid lines and tick labels so scale is readable
- Label shows MM/DD date format for compact display
- Chart heading now reads "events ingested per day" to clarify
these are individual daily counts, not cumulative totals
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a collapsible "How does this work?" panel explaining:
- What the simulator does (live PowerQuery count → GB projection)
- When to use it (after spotting a noisy source in Top Sources)
- How to fill in Source name (copy from dataSource.name column)
- What Event type does (optional narrowing)
- How the GB estimate is calculated
- Warning that it is read-only — no filters are applied automatically
Also updates Source name placeholder to show a concrete example.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- daily-volume: run per-day PowerQueries in parallel with asyncio.gather
instead of sequentially with sleeps — 3 days now completes in ~16s vs 140s+
- Default view changed from 7d to 3d; day buttons updated to [3, 5, 7]
- igLoad: fire daily-volume and top-sources simultaneously with Promise.allSettled
so both panels load in parallel rather than one after the other
- Each panel shows "Querying data lake…" spinner while loading
- Each panel renders independently — one failure doesn't block the other
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Sidebar: ⚙ Settings link pinned to bottom of nav
- Settings page: view all config keys (secrets masked), edit and save directly to .env
- Show/hide toggle for secret fields (tokens, keys)
- First-time setup banner with cp .env.example .env instructions when .env is missing
- Manual setup section with step-by-step terminal commands and where to find each credential
- New .env.example template with comments for all required variables
- Backend: GET/POST /api/settings/config router reads/writes mounted .env file
- docker-compose: mounts .env into backend container at /app/.env for write access
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>