Commit Graph

22 Commits

Author SHA1 Message Date
Mick b494c751aa Revert "Preserve parser_detected across syncs to prevent coverage regression"
This reverts commit 21c8644443.
2026-05-22 12:08:56 -04:00
Mick 21c8644443 Preserve parser_detected across syncs to prevent coverage regression
Before re-creating ActiveSource rows, snapshot existing parser_detected
values. When writing new rows, take max(new, previous) so a source that
was once confirmed as parsed (event.type present in the data lake) never
loses its Covered status due to a sampling gap, partial query result, or
SDL PowerQuery timeout during Sync All.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 12:07:03 -04:00
Mick 7620d1fcc8 Add product grouping to rule displays across coverage and threat pages
- Extract product label from rule data_sources in coverage.py via new
  _product_from_data_sources() helper (prefers non-SentinelOne entries
  so product-specific rules get a meaningful label)
- Coverage Map detections column: rules now grouped by product with
  collapsible chevron headers showing fired/silent counts
- Threat Coverage Rule Firing Status: collapsible product group headers
  with active/silent summary; shows all 2066 rules across 30 products
- Threat Coverage Dependency Map: collapsible product groups, at-risk
  products sorted first with risk count in header
- Ingest Dashboard: fix source name truncation — table cells now wrap
  with break-all and title tooltip; bar chart labels extended to 16
  chars with ellipsis and full-name tooltip on hover

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 11:56:27 -04:00
Mick 800d3c545a Split onboarding pipeline into detection-mapped vs parser-only groups
Sources without detection rules no longer show stages 5-6 as failures:
- Backend: has_detection_rules flag added per source; progress (pct) calculated
  over 4 core stages for sources with no rules; detection stages marked na:true
- Frontend: pipeline splits into two sections —
    'With Detection Coverage' (6-stage, full pipeline)
    'Parser Only' (4-stage, stages 5-6 shown as — N/A)
  Each section has its own Show/Hide completed toggle
- Collapsed by default; Show Pipeline toggle reveals both sections

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 11:26:26 -04:00
Mick d0299e0f23 Add health score, coverage trends, dependency map, PowerQuery playground, onboarding tracker
Tenant Health Score:
- CoverageSnapshot table stores daily health metrics (parser %, MITRE %, firing %)
- _compute_health() weighted formula: 40% parser coverage + 35% MITRE + 25% firing
  (reweighted 55/45 when firing cache empty)
- GET /api/coverage/health returns score + delta vs previous snapshot
- GET /api/coverage/snapshots returns chronological history for sparklines
- POST /api/coverage/snapshot for manual recording
- Auto-snapshot recorded at end of every sync-sources call
- Overview dashboard: prominent health score card with color coding, component
  breakdown, delta indicator, and inline SVG sparkline (last 30 points)

Rule Dependency Map:
- GET /api/coverage/dependency-map flips the coverage map — rule → required sources
- Each source flagged healthy/inactive/no_parser; at_risk = any source missing
- New section on Threat Coverage tab with at-risk filter toggle

PowerQuery Playground:
- New query.py router: GET /presets (7 curated queries) + POST /run
- New Query nav tab with time-range pills, preset buttons, localStorage history,
  monospace textarea, auto-column results table, client-side CSV export

Onboarding Tracker:
- GET /api/coverage/onboarding-status returns per-source pipeline progress
  across 6 stages: Data Received → Parser File → Parser Active → Source
  Labeled → Detection Rules → Rules Firing
- New section on Onboarding tab with emoji stage dots, progress bars,
  collapsed completed sources with show/hide toggle

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 11:09:43 -04:00
Mick 7b4eceefb8 Fix MITRE extraction to use actual S1 API structure + use generatedAlerts for firing status
MITRE fix:
- S1 platform-rules API returns rule["mitre"] = [{tactic, techniques:[{id,title}]}]
  not the flat field names we were checking — updated _extract_mitre to handle
  this as the primary path, keeping flat field fallback for STAR rules
- generatedAlerts field on each platform rule stored in raw JSON during import

Firing status fix:
- sync-rule-firing now reads generatedAlerts from ParsedRule.raw as fast path
  (instant, no SDL PowerQuery needed) since it's returned directly by the
  platform-rules API on every library sync
- SDL PowerQuery retained as fallback for rules imported from detections.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:42:48 -04:00
Mick 7922de315e Add MITRE ATT&CK heatmap and detection rule firing status
MITRE ATT&CK heatmap:
- _extract_mitre() helper extracts tactics/techniques from S1 API rules
  handling multiple field name conventions (tactic, mitreTechniques, etc.)
- _import_from_api_rules and _import_detections now store tactics/techniques
  in raw JSON alongside data_sources
- GET /api/coverage/mitre returns tactic/technique breakdown ordered by
  ATT&CK kill chain with coverage stats
- New "Threat Coverage" tab in frontend: stat cards (total rules, MITRE
  mapped, tactics covered, techniques covered), tactic cards grid with
  left-border color coding and technique chips with "+N more" expander

Detection rule firing status:
- RuleFiringCache table tracks alert_count per rule_name
- POST /api/coverage/sync-rule-firing queries SDL PowerQuery with 3
  field-name patterns to find rule firing data; upserts into cache
- GET /api/coverage/rule-firing-cache returns cache sorted by alert count
- /map now includes alert_count per rule and firing_cache_populated flag
- Coverage map Detections column: when cache populated, shows alert count
  in green or ⚠ amber for rules that have never fired

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:25:45 -04:00
Mick 2c40bf81ee Cherry-pick improvements from PR #2 (marcredhat)
- s1_client: configurable PowerQuery timeout via SDL_PQ_TIMEOUT env var
  (default 600s, was hardcoded 120s) with separate connect/read timeouts
  via httpx.Timeout; retry on ReadTimeout via SDL_PQ_TIMEOUT_RETRIES;
  better error messages include query snippet and parse non-JSON responses
- ingest: fix simulate-filter SDL syntax (== → =, drop leading | on base
  expression, surface PowerQuery error field, cleaner empty-filter fallback)
- docker-compose: pass SDL_PQ_TIMEOUT and SDL_PQ_TIMEOUT_RETRIES through
  to backend container with sensible defaults

Not taken from PR #2:
- .gitignore parsers/* change — would untrack the 7 committed parser files
- s1_client/quality/coverage changes already present in main from prior work

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:11:42 -04:00
Mick c5a4f796a0 Add unlabelled event detection, stub parser quality, Sync All, and modern UI redesign
Key changes:
- Unlabelled event banner: shows count only after Sample Events is clicked; uses broad SDL filter expression; time window synced to sync-days dropdown
- Parser Quality: new "Attributes Missing" subsection listing all parsers without dataSource.name regardless of event volume
- Coverage map: filter buttons (All / Complete Parser / Attributes Missing); stat card renamed to "Incomplete Parser"; stub count excluded from sync when no active sources
- Sync All button: runs SDL parser sync → library sync → live sources sync in sequence
- Reset now clears ActiveSource table and resets unlabelled count cache
- run_powerquery: configurable max_count param (default 1000, 50M for count queries)
- _DS_NAME_RE: supports both quoted and unquoted dataSource.name keys in parser files
- Full modern UI redesign: slate palette, gradient cards, ring borders, pill nav, colored stat accents
- Updated 7 tracked parser files synced from SDL

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:00:21 -04:00
Mick 0013adbe7e Merge pull request #1 from marcredhat/fix/json-parser-and-pq-syntax
Fix Parser Test Runner JSON mode, Filter Simulator PQ syntax, and parser dropdown
2026-05-20 15:25:39 -04:00
Mick 6cd9da82da Auto-load detection library from S1 API, improve coverage map accuracy
- Fetch detection library rules from platform-rules API at startup (falls
  back to extracted.json); adds Sync Detection Library button for refresh
- Parser column simplified to ✓ Parsed / ✗ Not Parsed
- Detection counts now use library rules only (exclude custom STAR rules)
- Add close-match suggestions for dataSource.name mismatches (e.g. CloudTrail
  → AWS CloudTrail, Microsoft 365 Collaboration → Microsoft O365)
- Exclude SentinelOne Ranger AD from coverage map (native S1 source)
- Add success feedback banners to Load SDL Parsers and Sync Library buttons
- Remove rule_counts.json manual override; extracted.json is source of truth
- Remove Load Detections button; rules auto-import on backend startup
- Add get_account_id() and get_platform_rules() to s1_client

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 15:14:10 -04:00
marc 8dbd38f3bb Fix Parser Test Runner JSON mode, Filter Simulator PQ syntax, dropdown source
- backend/routers/quality.py
 * Add GET /api/quality/parsers (lists actual files in /app/parsers)
 * Support SDL JSON auto-extract parsers ($=json{parse=json}$)
 * Apply parser rewrite blocks with correct $0/$N backref translation
 * Accept single JSON / JSON array / NDJSON in test-parser body
 * Flatten JSON inside 'message' for Field Population coverage
- backend/routers/ingest.py
 * Rewrite simulate-filter PowerQuery to valid SDL syntax
 * Correct field name: src.name -> dataSource.name
- frontend/index.html
 * Parser dropdown loads from /api/quality/parsers
 * Add 'Last 7d' lookback option
 * Render JSON-mode test results with badges + payload counter
2026-05-20 19:40:24 +02:00
Mick 6e137438b1 Add Detection Fields Missing column + STAR_LIBRARY_ONLY setting
Coverage Map:
- New "Detection Fields Missing" column shows dotted-path SDL fields that
  associated STAR rules reference but the parser does not provide
- Only dotted field paths (src.ip, winEventLog.channel) are considered;
  single-word correlation variables and metadata tokens are excluded
- Schema fields always present in events (dataSource.name, event.type etc)
  are excluded from the missing list

Settings:
- New STAR_LIBRARY_ONLY field (select: true/false) controls whether
  Load Library STAR Rules filters to @sentinelone.com creators or loads all
- Rendered as a dropdown in the Settings form with a hint description
- saveSettings now always persists select field values (not just non-empty)
- load-star-rules reads STAR_LIBRARY_ONLY env var as its default

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 15:46:05 -04:00
Mick a50fd35934 Filter STAR rules to Library only (creator @sentinelone.com)
load-star-rules now defaults to library_only=true, filtering rules where
the creator email ends in @sentinelone.com. Custom tenant rules are excluded
by default. Pass ?library_only=false to load all rules.
Button label updated to "Load Library STAR Rules" to make intent clear.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 15:42:09 -04:00
Mick 1b07a59991 Use parsed event detection in data lake as coverage signal
- sync-sources now runs a parallel PowerQuery checking for event.type
  population per source; count stored in new active_sources.parser_detected
- Coverage map marks a source as covered if parser_detected > 0, even
  without a matching local parser file (handles built-in/cloud parsers)
- UI parser cell shows "Parsed (N typed events detected)" for data-lake-
  detected parsers vs named local parser files
- Runtime ALTER TABLE migration adds parser_detected column to existing DBs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 13:06:29 -04:00
Mick 81e3656c46 Fix coverage map matching: three-tier lookup for parser-to-source mapping
1. Exact dataSource.name match
2. Normalized substring on parser's dataSource.name attribute
3. Normalized substring on parser filename (catches files with wrong ds name)

Fixes CloudTrail (filename aws_cloudtrail-latest matches "cloudtrail") and
Palo Alto Networks Firewall (ds name "Palo Alto Networks" matches via substring).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 12:56:51 -04:00
Mick 999c0f7b83 Add Parser Quality page: Live Event Sampler, Field Population Rate, Parser Test Runner
- New /api/quality router with three endpoints:
  sample-events: pull raw events from a source via PowerQuery
  field-population: measure % of events with each SDL field populated;
    surfaces dataSource.name correctly (100% when filtered by it) and
    returns fields_seen_in_sample so you can see what IS being extracted
  test-parser: converts SDL \$field=pattern\$ format strings to Python
    named-group regex and tests against a pasted raw log line
- New "Parser Quality" nav item and page with all three tools
- Home page card added for Parser Quality
- Field population UI shows per-field colour-coded progress bars plus
  a chip list of fields actually present in the sample

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 12:53:48 -04:00
Mick ac97196435 Improve coverage map matching, bar chart gradients, and add 1h time filter
- Coverage map: replace filename fuzzy-match with exact dataSource.name
  lookup read directly from parser file attributes; grok/dottedJson parsers
  now flagged as "parser_needed" with format type shown in the UI
- Bar chart: SVG linearGradient (light purple → deep violet) replaces flat fill
- Ingest dashboard: add 1h button (first option) backed by new backend
  hours= query param on /api/ingest/top-sources; daily-volume chart shows
  informational message when in 1h mode

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 12:43:10 -04:00
Mick f0bd56aee8 Rewrite coverage map as source-centric view
Previously showed field-level coverage (rule fields vs parser fields).
Now shows per-dataSource.name coverage: is a parser loaded for each
active ingest source?

- New ActiveSource DB model stores live sources from SDL
- New POST /api/coverage/sync-sources endpoint runs PowerQuery to fetch
  current dataSource.names and their event counts, stores in DB
- GET /api/coverage/map now returns per-source status:
    covered       = a loaded parser matches this source name
    parser_needed = source is ingesting but no parser is loaded
- Parser matching uses fuzzy substring (handles "palo"→"Palo Alto Networks Firewall")
- Coverage table shows: source name, 7d event count, status, matched parser + field count, STAR rules
- Frontend: new "Sync Live Sources" button, updated stats cards, updated filter tabs
- Removed field-level view (was confusing — parser_needed on a field ≠ missing parser for a source)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 12:31:48 -04:00
Mick 735e364b71 Fix Ingest Dashboard timeout causing failed to fetch
- daily-volume: run per-day PowerQueries in parallel with asyncio.gather
  instead of sequentially with sleeps — 3 days now completes in ~16s vs 140s+
- Default view changed from 7d to 3d; day buttons updated to [3, 5, 7]
- igLoad: fire daily-volume and top-sources simultaneously with Promise.allSettled
  so both panels load in parallel rather than one after the other
- Each panel shows "Querying data lake…" spinner while loading
- Each panel renders independently — one failure doesn't block the other

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 11:53:37 -04:00
Mick 2e55e21a77 Add Settings page with .env manager
- Sidebar: ⚙ Settings link pinned to bottom of nav
- Settings page: view all config keys (secrets masked), edit and save directly to .env
- Show/hide toggle for secret fields (tokens, keys)
- First-time setup banner with cp .env.example .env instructions when .env is missing
- Manual setup section with step-by-step terminal commands and where to find each credential
- New .env.example template with comments for all required variables
- Backend: GET/POST /api/settings/config router reads/writes mounted .env file
- docker-compose: mounts .env into backend container at /app/.env for write access

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 11:43:41 -04:00
Mick c182d837ee Initial commit: SIEM Toolkit for SentinelOne
Dockerized SecOps toolkit with:
- Coverage Map: STAR rule vs SDL parser field coverage analysis
- Ingest Dashboard: PowerQuery-powered event volume and source breakdown
- Onboarding Assistant: AI-guided log source onboarding with Claude
- Parser management via SDL MCP integration

Stack: FastAPI + PostgreSQL backend, nginx-served HTML frontend, Docker Compose.
PowerQuery runs via Scalyr XDR API (SDL_XDR_URL + SDL_LOG_READ_KEY).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 11:39:26 -04:00