Files
marcredhat-siem-toolkit-pat…/PATCHES.md
T
2026-05-20 23:44:53 +02:00

7.2 KiB

SIEM-Toolkit Patches & Helper Scripts

A drop-in patch set that fixes several issues in the upstream mickbrowns1/SIEM-Toolkit and adds helper scripts for syncing parsers from a SentinelOne SDL tenant and probing PowerQuery / event data.

What's fixed in the upstream code

File Fix
backend/routers/ingest.py Filter Simulator PowerQuery rewritten — replaced legacy count() as events and src.name field with valid SDL | filter dataSource.name=='X' | group events=count()
backend/routers/quality.py New GET /api/quality/parsers endpoint lists actual parser files; _flatten_event now JSON-parses nested message payloads so the Field Population tool reports real coverage (was always 0% for sources where the parser isn't applied at query time)
backend/routers/quality.py (Parser Test Runner) Detects SDL JSON auto-extract format $=json{parse=json}$ and parses log lines as JSON; applies parser rewrites (input/output/match/replace blocks) with correct $0/$N backreference handling; accepts single JSON / JSON array / NDJSON input
frontend/index.html Parser dropdown now loads from /api/quality/parsers (was filtering coverage/map which only has detected in data placeholders); added Last 7d lookback to both Field Population and Sample Events; Test Runner UI now shows mode badge (JSON auto-extract vs regex format), payload count for multi-line input, and separate tables for extracted vs derived/rewritten fields

What's NOT fixed in the upstream code (configuration)

The repo's docker-compose.yml interpolates S1_BASE_URL etc. from .env at compose-up time. A docker compose restart does NOT pick up .env changes — always use docker compose up -d --force-recreate backend.

S1_BASE_URL must be the per-tenant management console subdomain (e.g. usea1-XXXX.sentinelone.net), not the regional SDL/XDR endpoint. If you only know the XDR URL, you can probe candidates with curl:

TOKEN=$(jq -r .api_token < ~/.../mgmt-config.json)
for H in usea1-yourtenant usea1-purple usea1-partners; do
  printf "%-45s  %s\\n" "$H" \\
    "$(curl -s -o /dev/null -w '%{http_code}' \\
       \"https://$H.sentinelone.net/web/api/v2.1/cloud-detection/rules?limit=1\" \\
       -H \"Authorization: ApiToken $TOKEN\")"
done
# 200 = correct host

Contents

.
├── README.md                       (this file)
├── env.example                     template for the toolkit's .env
├── sdl_config.example.json         template for helper scripts' SDL config
├── patched-files/
│   ├── backend/routers/
│   │   ├── ingest.py               <- copy over upstream
│   │   └── quality.py              <- copy over upstream
│   └── frontend/
│       └── index.html              <- copy over upstream
└── scripts/
    ├── sync_sdl_parsers.py         pull all /logParsers/* from the tenant into ./parsers/
    ├── probe_pq_syntax.py          test what PowerQuery dialect the tenant accepts
    ├── probe_avelios.py            sample probe: find a source's events + columns
    ├── probe_avelios_wide.py       same, sweeping 1d/3d/7d
    ├── probe_avelios_fields.py     parse JSON `message` payloads & count fields
    ├── test_avelios_parser.py      hit /api/quality/test-parser with one JSON line
    └── test_avelios_multi.py       same, with multi-line NDJSON

Applying the patches

  1. Clone the upstream repo:
    git clone https://github.com/mickbrowns1/SIEM-Toolkit.git
    cd SIEM-Toolkit
    
  2. Overlay the patched files:
    PATCH=/path/to/this/dir
    cp "$PATCH"/patched-files/backend/routers/quality.py backend/routers/quality.py
    cp "$PATCH"/patched-files/backend/routers/ingest.py  backend/routers/ingest.py
    cp "$PATCH"/patched-files/frontend/index.html        frontend/index.html
    
  3. Configure:
    cp "$PATCH"/env.example .env
    $EDITOR .env                          # fill in your real values
    
  4. Start the stack:
    docker compose up -d --build
    open http://localhost:3001
    

Helper-script setup

The helper scripts read a small JSON config (separate from the toolkit's .env) containing your SDL log-read / config-read keys:

cp sdl_config.example.json scripts/sdl_config.json
$EDITOR scripts/sdl_config.json
# or set the env var
export SDL_CONFIG=/somewhere/sdl_config.json

Helper-script usage

Sync parsers from the SDL tenant into the toolkit's parsers/ dir

PARSERS_DIR=/path/to/SIEM-Toolkit/parsers \\
  python3 scripts/sync_sdl_parsers.py

By default PARSERS_DIR defaults to ../parsers relative to the script.

Probe PowerQuery syntax compatibility on your tenant

python3 scripts/probe_pq_syntax.py

Output tells you which command shapes (| group ..., filter ..., count() as, etc.) work on the active deployment.

Inspect what a given source's events actually look like

python3 scripts/probe_avelios.py            # finds a source's name + 1-line sample
python3 scripts/probe_avelios_wide.py       # sweeps 1d/3d/7d top sources
python3 scripts/probe_avelios_fields.py     # if `message` is JSON, flatten & count fields

The scripts are named *_avelios for the original use case but work for any source — open the file and change the dataSource.name filter.

Smoke-test the patched Parser Test Runner endpoint

python3 scripts/test_avelios_parser.py      # single-line JSON
python3 scripts/test_avelios_multi.py       # multi-line NDJSON

These hit http://localhost:8001/api/quality/test-parser directly so you can verify the backend without using the UI.

Common pitfalls

  • Parser dropdown is empty → run sync_sdl_parsers.py. The upstream "Load SDL Parsers" button only indexes whatever already exists in parsers/.
  • Field Population shows 0% everywhere → the source's parser isn't being applied at query time, so PowerQuery returns just timestamp+message. This patch's _flatten_event parses JSON inside message. Also try widening the window (the new Last 7d option) — some sources are low-volume.
  • PowerQuery 400 "Unknown command [count]" → fixed in ingest.py. If you hit it elsewhere, the rule is: SDL PowerQuery requires \| group events=count(), never \| count() as events, and count() must be inside a group.
  • STAR rules → 302 to /404S1_BASE_URL is pointed at the SDL/XDR URL instead of the management-console subdomain.

Verification

After applying patches and recreating containers:

curl http://localhost:8001/health
curl http://localhost:8001/api/quality/parsers | python3 -m json.tool   # count > 0
curl 'http://localhost:8001/api/ingest/top-sources?hours=24'           # real numbers
curl -X POST http://localhost:8001/api/coverage/load-star-rules        # not 502

In the UI:

  • Coverage Map: shows parsers_loaded and rules_loaded > 0
  • Ingest → Filter Simulator: returns matched events + projected GB/month
  • Parser Quality → Parser Test Runner: dropdown lists all parsers
  • Parser Quality → Field Population: real coverage rates (not all 0%)