Files
marc 7c1687efce Sync upstream features; preserve fork KV scanner, parsers, verifier
Brought in 35 upstream commits (MITRE heatmap, health score, dependency map,
PowerQuery playground, onboarding tracker, product grouping, modern UI redesign).

Preserved fork additions:
  backend/routers/quality.py  KV scanner, pattern refs, JS keys, JSON mode,
                              /parsers + /sync-from-sdl endpoints
  parsers/                    96 OCSF + tenant parsers
  tools/stormshield-verify/   end-to-end ingest regression test
  .gitignore                  un-ignored parsers/*
  CHANGES.md, PATCHES.md
2026-05-22 18:19:52 +02:00

105 lines
4.7 KiB
Markdown

# Changes vs upstream `mickbrowns1/SIEM-Toolkit`
All edits are confined to a handful of files; everything else is untouched.
## `backend/services/s1_client.py`
### PowerQuery client
- All raised exceptions now include the request body / status / query so the
UI never shows a blank `"PowerQuery error: "`.
- Non-JSON responses (HTML 5xx gateway pages) surface as a readable error
string instead of crashing on `resp.json()`.
### Detection library: site-scope fallback (`get_platform_rules`)
- Upstream hardcoded **account scope** which 403s with site-scoped API
tokens. Added `get_scope_for_platform_rules()` that probes `/accounts`
first, then `/sites`, returning whichever scope the token can access.
- `get_account_id()` now also reads `accountId` from the `/sites` payload as
a fallback for site-scoped tokens.
### SDL parser sync helpers
- `list_sdl_parsers()` — rewritten to use the real **SDL Configuration File
API** (`POST /api/listFiles` with `pathPrefix=/logParsers/`). Previously
it hit a 404 path on the mgmt console.
- `get_sdl_parser()` — rewritten to `POST /api/getFile` with `{path}`.
- New `_sdl_config_headers()` helper that uses `SDL_CONFIG_READ_KEY` (a
separate scope from `SDL_LOG_READ_KEY`).
## `backend/routers/ingest.py`
- `/api/ingest/simulate-filter`:
* Rebuilt the query into valid SDL syntax — was generating
`| group events=count()` (dangling pipe) for empty bodies; now uses a
proper base expression and falls back to `dataSource.name!=''` baseline.
* Field name corrected from `src.name``dataSource.name`.
* Surfaces both `result["error"]` and exception text so blank
`"PowerQuery error: "` messages are gone.
## `backend/routers/quality.py`
- `GET /api/quality/parsers`: lists actual parser filenames in
`/app/parsers/` (drives the Test Runner dropdown).
- **New `POST /api/quality/sync-from-sdl`**: downloads every parser file
under `/logParsers/` on the SDL tenant into `/app/parsers/`. After this
call returns, the Parser Test Runner dropdown automatically reflects all
tenant parsers (including custom OCSF parsers like
`Avelios-Medical-OCSF`). Requires `SDL_CONFIG_READ_KEY` in `.env`.
- `_flatten_event`: when a PowerQuery row only carries a JSON-stringified
payload in `message` (i.e. the parser isn't applied at query time), parse
and flatten that JSON inline so the Field Population tool can measure real
coverage.
- `POST /api/quality/test-parser`:
* Detects SDL JSON-mode parsers (`$=json{parse=json}$`) and parses log
lines as JSON.
* Applies parser `rewrites: [{input,output,match,replace}]` blocks with
correct `$0/$N` backreference translation (`$0` was being mangled to a
null byte).
* Accepts single JSON object, JSON array, or NDJSON multi-line input.
* Returns mode badge data + per-payload counters for the UI.
## `frontend/index.html`
- Parser Test Runner dropdown now loads from `/api/quality/parsers` instead
of filtering the coverage map (which only has `detected in data`
placeholders).
- Field Population and Sample Events: added **Last 7d** lookback option.
- Parser Test Runner UI: mode badge (`JSON auto-extract` vs `regex format`),
payload counter for multi-line input, separate tables for extracted vs
derived/rewritten fields.
## `docker-compose.yml`
- Pass `SDL_CONFIG_READ_KEY` through to the backend container.
## `.env.example` / `.gitignore`
- Document the new `SDL_CONFIG_READ_KEY` variable.
- Broaden `.gitignore` so `parsers/*` (tenant-specific synced content) is
not committed.
## New helper scripts (`tools/`)
- `sync_sdl_parsers.py` — pull all `/logParsers/*` from the tenant.
- `probe_pq_syntax.py` — probe which PowerQuery syntaxes the tenant accepts.
- `probe_avelios{,_wide,_fields}.py` — inspect a source's event presence,
columns, and embedded JSON fields.
- `test_avelios_parser.py`, `test_avelios_multi.py` — smoke-test the patched
`/api/quality/test-parser` endpoint with single-line and multi-line input.
- `probe_simulate_filter.py` — smoke-test the patched
`/api/ingest/simulate-filter` endpoint with progressively larger windows.
- `probe_sync_from_sdl.py` — call `/api/quality/sync-from-sdl` and verify
that `/api/quality/parsers` then reflects the downloaded parsers.
- `sdl_config.example.json` — template config (the toolkit's `.env` is
separate from the SDL config used by these helper scripts).
## New `.env` knobs
```bash
# PowerQuery transport tuning (both optional; defaults work for most tenants)
SDL_PQ_TIMEOUT=600 # PowerQuery read timeout in seconds (default 600)
SDL_PQ_TIMEOUT_RETRIES=1 # extra retries on ReadTimeout (default 1)
# Required for /api/quality/sync-from-sdl
SDL_CONFIG_READ_KEY=... # Data Lake API key with Configuration Read scope
```