Files
marc 7c1687efce Sync upstream features; preserve fork KV scanner, parsers, verifier
Brought in 35 upstream commits (MITRE heatmap, health score, dependency map,
PowerQuery playground, onboarding tracker, product grouping, modern UI redesign).

Preserved fork additions:
  backend/routers/quality.py  KV scanner, pattern refs, JS keys, JSON mode,
                              /parsers + /sync-from-sdl endpoints
  parsers/                    96 OCSF + tenant parsers
  tools/stormshield-verify/   end-to-end ingest regression test
  .gitignore                  un-ignored parsers/*
  CHANGES.md, PATCHES.md
2026-05-22 18:19:52 +02:00

4.7 KiB

Changes vs upstream mickbrowns1/SIEM-Toolkit

All edits are confined to a handful of files; everything else is untouched.

backend/services/s1_client.py

PowerQuery client

  • All raised exceptions now include the request body / status / query so the UI never shows a blank "PowerQuery error: ".
  • Non-JSON responses (HTML 5xx gateway pages) surface as a readable error string instead of crashing on resp.json().

Detection library: site-scope fallback (get_platform_rules)

  • Upstream hardcoded account scope which 403s with site-scoped API tokens. Added get_scope_for_platform_rules() that probes /accounts first, then /sites, returning whichever scope the token can access.
  • get_account_id() now also reads accountId from the /sites payload as a fallback for site-scoped tokens.

SDL parser sync helpers

  • list_sdl_parsers() — rewritten to use the real SDL Configuration File API (POST /api/listFiles with pathPrefix=/logParsers/). Previously it hit a 404 path on the mgmt console.
  • get_sdl_parser() — rewritten to POST /api/getFile with {path}.
  • New _sdl_config_headers() helper that uses SDL_CONFIG_READ_KEY (a separate scope from SDL_LOG_READ_KEY).

backend/routers/ingest.py

  • /api/ingest/simulate-filter:
    • Rebuilt the query into valid SDL syntax — was generating | group events=count() (dangling pipe) for empty bodies; now uses a proper base expression and falls back to dataSource.name!='' baseline.
    • Field name corrected from src.namedataSource.name.
    • Surfaces both result["error"] and exception text so blank "PowerQuery error: " messages are gone.

backend/routers/quality.py

  • GET /api/quality/parsers: lists actual parser filenames in /app/parsers/ (drives the Test Runner dropdown).
  • New POST /api/quality/sync-from-sdl: downloads every parser file under /logParsers/ on the SDL tenant into /app/parsers/. After this call returns, the Parser Test Runner dropdown automatically reflects all tenant parsers (including custom OCSF parsers like Avelios-Medical-OCSF). Requires SDL_CONFIG_READ_KEY in .env.
  • _flatten_event: when a PowerQuery row only carries a JSON-stringified payload in message (i.e. the parser isn't applied at query time), parse and flatten that JSON inline so the Field Population tool can measure real coverage.
  • POST /api/quality/test-parser:
    • Detects SDL JSON-mode parsers ($=json{parse=json}$) and parses log lines as JSON.
    • Applies parser rewrites: [{input,output,match,replace}] blocks with correct $0/$N backreference translation ($0 was being mangled to a null byte).
    • Accepts single JSON object, JSON array, or NDJSON multi-line input.
    • Returns mode badge data + per-payload counters for the UI.

frontend/index.html

  • Parser Test Runner dropdown now loads from /api/quality/parsers instead of filtering the coverage map (which only has detected in data placeholders).
  • Field Population and Sample Events: added Last 7d lookback option.
  • Parser Test Runner UI: mode badge (JSON auto-extract vs regex format), payload counter for multi-line input, separate tables for extracted vs derived/rewritten fields.

docker-compose.yml

  • Pass SDL_CONFIG_READ_KEY through to the backend container.

.env.example / .gitignore

  • Document the new SDL_CONFIG_READ_KEY variable.
  • Broaden .gitignore so parsers/* (tenant-specific synced content) is not committed.

New helper scripts (tools/)

  • sync_sdl_parsers.py — pull all /logParsers/* from the tenant.
  • probe_pq_syntax.py — probe which PowerQuery syntaxes the tenant accepts.
  • probe_avelios{,_wide,_fields}.py — inspect a source's event presence, columns, and embedded JSON fields.
  • test_avelios_parser.py, test_avelios_multi.py — smoke-test the patched /api/quality/test-parser endpoint with single-line and multi-line input.
  • probe_simulate_filter.py — smoke-test the patched /api/ingest/simulate-filter endpoint with progressively larger windows.
  • probe_sync_from_sdl.py — call /api/quality/sync-from-sdl and verify that /api/quality/parsers then reflects the downloaded parsers.
  • sdl_config.example.json — template config (the toolkit's .env is separate from the SDL config used by these helper scripts).

New .env knobs

# PowerQuery transport tuning (both optional; defaults work for most tenants)
SDL_PQ_TIMEOUT=600              # PowerQuery read timeout in seconds (default 600)
SDL_PQ_TIMEOUT_RETRIES=1        # extra retries on ReadTimeout (default 1)

# Required for /api/quality/sync-from-sdl
SDL_CONFIG_READ_KEY=...         # Data Lake API key with Configuration Read scope