mirror of https://github.com/marcredhat/SIEM-toolkit-patched synced 2026-06-08 12:33:51 +00:00

Files

T

marc 4df8e844e5 Sigma -> SentinelOne PowerQuery pipeline

End-to-end workflow that turns SigmaHQ rules into SDL Scheduled
custom-detection rules:

1. SIEM-toolkit provides the coverage map to find what's thin --
   MITRE ATT&CK heatmap across all detection library rules, rule
   firing status (active vs never-fired).
2. Pick Sigma rules (https://github.com/SigmaHQ/sigma) that target
   those tactics.
3. Convert the Sigma rules to PowerQuery with
   pysigma-backend-sentinelone-pq.
4. Smoke-test against your tenant's /api/powerQuery, deploy via
   /web/api/v2.1/cloud-detection/rules as Scheduled PQ rules in Draft.
5. Re-running on a different tenant is just re-pointing the
   credentials -- the converted .pq bodies travel as-is.

Files:
  README_sigma_pipeline.md       full workflow doc
  recommend_sigma_imports.py     coverage-map reader -> rule shortlist
  probe_wel_schema.py            WEL parser field discovery
  convert_test_deploy_sigma.py   pick + convert + 3 variants + deploy
  fixup_rules_6_7.py             OriginalFileName pre-processor
  run_sigma_on_tenant.py         redeploy already-converted bodies
  verify_rule_exists_via_put.py  PUT-existence test (RBAC workaround)
  verify_deployed_sigma_rules.py RBAC visibility diagnostic
  tenant_config.example.json     credentials template (gitignored real one)

Each converted rule emits three PowerQuery variants:
  <stem>.pq          faithful (S1 DV schema)
  <stem>.relaxed.pq  drops endpoint.os + event.type clauses
  <stem>.wel.pq      rewritten onto microsoft_windows_eventlog-latest

All scripts read credentials from tenant_config.json (or the
SIEM_TOOLKIT_CONFIG env var), discover the target site_id at runtime,
and persist deployed rule IDs to deployed_rule_ids.json so the verify
scripts work without hardcoded IDs.

2026-05-28 12:29:37 +02:00

9.5 KiB

Raw Permalink Blame History

Sigma → SentinelOne PowerQuery pipeline

End-to-end workflow that turns SigmaHQ rules into SentinelOne SDL Scheduled custom-detection rules, starting from the coverage gaps the SIEM-toolkit identifies.

TL;DR

SIEM-toolkit provides the coverage map to find what's thin — MITRE ATT&CK heatmap across all detection library rules, rule firing status (active vs never-fired).
Pick Sigma rules (SigmaHQ/sigma) that target those tactics.
Convert the Sigma rules to PowerQuery with pysigma-backend-sentinelone-pq.
Smoke-test against your tenant's /api/powerQuery, deploy via /web/api/v2.1/cloud-detection/rules as Scheduled PQ rules in Draft.
Re-running on a different tenant is just re-pointing the credentials — the converted .pq bodies travel as-is.

Setup (once)

# 1. Tooling
python3 -m venv /tmp/sigma_venv
/tmp/sigma_venv/bin/pip install pysigma pysigma-backend-sentinelone-pq
brew install gh && gh auth login                # avoids GitHub rate limits

# 2. Credentials
cp tenant_config.example.json tenant_config.json
$EDITOR tenant_config.json                      # fill in 5 keys
# tenant_config.json is gitignored.

tenant_config.json shape:

{
  "S1_CONSOLE_URL":       "https://<region>-<tenant>.example",
  "S1_CONSOLE_API_TOKEN": "<S1 Mgmt API token>",
  "SDL_XDR_URL":          "https://xdr.<region>.example",
  "SDL_LOG_READ_KEY":     "<SDL Log Read scope>",
  "SDL_CONFIG_READ_KEY":  "<SDL Configuration Read scope>"
}

Optional environment overrides:

Variable	Default	Purpose
`SIEM_TOOLKIT_CONFIG`	`./tenant_config.json`	path to credentials
`SIGMA_OUT_DIR`	`/tmp/sigma_converted_v4`	where `.pq` artefacts land
`SIGMA_VENV_PY`	`/tmp/sigma_venv/bin/python3`	Python that hosts pysigma
`GH_BIN`	`gh`	GitHub CLI binary
`SITE_ID`	(auto-discovered)	force-deploy into a specific site
`DEPLOYED_IDS_FILE`	`./deployed_rule_ids.json`	input for verify scripts

The 5-step workflow

Step 1 — Find thin tactics

python3 recommend_sigma_imports.py

Reads the SIEM-toolkit coverage endpoints (/api/coverage/health, /api/coverage/mitre, /api/coverage/map) and prints, in order:

Tenant health row (health_score, firing_pct, active sources).
Active log sources ranked by event volume — only import Sigma rules whose logsource matches a source that actually produces events here.
MITRE tactic depth — tactics with rule_count < 100 and a high technique_count are the THIN ones. Typical findings: Reconnaissance, Discovery, Lateral Movement, Collection, Exfiltration.
Recommended SigmaHQ folders with GitHub-verified rule counts.
A curated 14-rule shortlist for the thinnest gaps.

Step 2 — Pick Sigma rules

The picker in convert_test_deploy_sigma.py matches filename-stem keywords against the SigmaHQ tree it lists via gh api. Edit the WANTED table to change the 10 rules. Each row is (tactic, technique_label, [keywords], allow_powershell_folder).

The default list covers:

Tactic	Technique	Sigma file
Lateral Movement	T1021.006 WinRM (evil-winrm)	`proc_creation_win_hktl_evil_winrm.yml`
Collection	T1113 Screen Capture (Psr.exe)	`proc_creation_win_psr_capture_screenshots.yml`
Collection	T1115 Clipboard (Get-Clipboard)	`proc_creation_win_powershell_get_clipboard.yml`
Exfiltration	T1560.001 RAR (.dmp files)	`proc_creation_win_winrar_exfil_dmp_files.yml`
Exfiltration	T1567.002 rclone	`proc_creation_win_pua_rclone_execution.yml`
Reconnaissance	T1016 netsh portproxy	`proc_creation_win_netsh_port_forwarding.yml`
Discovery	T1087/T1033 whoami /priv	`proc_creation_win_whoami_priv_discovery.yml`
Discovery	T1087/T1482 SharpHound	`proc_creation_win_hktl_bloodhound_sharphound.yml`
Credential Access	T1003.001 Mimikatz cmd-line	`proc_creation_win_hktl_mimikatz_command_line.yml`
Credential Access	T1003.001 ProcDump LSASS	`proc_creation_win_sysinternals_procdump_lsass.yml`

Step 3 — Convert + smoke-test + deploy

Optional preliminary: probe what fields the tenant's WEL parser actually emits so the WEL-mapped variant queries land on real columns:

python3 probe_wel_schema.py

Then run the master pipeline:

# Convert + smoke-test only:
python3 convert_test_deploy_sigma.py

# Convert + smoke-test + create SDL Scheduled rules in Draft:
python3 convert_test_deploy_sigma.py --deploy

For each of the 10 rules the script writes three PowerQuery variants:

File	Purpose
`<stem>.pq`	faithful — S1 DV schema (production form)
`<stem>.relaxed.pq`	strips `endpoint.os` and `event.type` clauses (useful on tenants where those fields are null)
`<stem>.wel.pq`	rewritten onto the `microsoft_windows_eventlog-latest` parser fields (`CommandLine`, `Image`, `ParentImage`, `EventID=4688\|1`, `dataSource.name='Windows Event Logs'`)

Each variant is smoke-tested against POST {SDL_XDR_URL}/api/powerQuery (last 24 h). HTTP 200 is what we want; rows=0 simply means no telemetry matched in the window.

With --deploy, the faithful variant is also POSTed to /web/api/v2.1/cloud-detection/rules as a Scheduled rule in Draft status, then deployed_rule_ids.json is written next to the script mapping each rule ID back to its source.

Edge cases the converter handles

Unsupported Sigma fields (e.g. OriginalFileName) cause the backend to print its known-field list as the error. fixup_rules_6_7.py strips those keys from the YAML and re-converts. The rule remains semantic because Image|endswith: is the primary selector.
Wrong folder — some rules live under rules/windows/powershell/ not process_creation/. The picker can expand its scope.
event.type='Process Creation' and endpoint.os='windows' are often empty on real tenants — that's why the relaxed and WEL variants exist.

Step 4 — Verify

The service-user role that can POST a rule often cannot GET it back (cloudDetectionRulesView missing). The collection endpoint silently filters the rule out, and GET /rules/{id} returns HTTP 405 on this API version. PUT is the definitive existence test:

python3 verify_rule_exists_via_put.py

Reads deployed_rule_ids.json and PUTs each rule ID. 200/204 = EXISTS, 404 = NOT FOUND. Optional deeper diagnostic:

python3 verify_deployed_sigma_rules.py

Probes the list endpoint with several scope-filter variants so you can see exactly which RBAC layer is hiding what.

Step 5 — Run on another tenant

The 30 .pq files in SIGMA_OUT_DIR are tenant-agnostic. Point the credentials at a different tenant and re-run only Step 3's deploy + Step 4:

# Option A: replace tenant_config.json
cp tenant_config.example.json tenant_config.json && $EDITOR tenant_config.json
python3 run_sigma_on_tenant.py

# Option B: keep separate config files
SIEM_TOOLKIT_CONFIG=./tenant_prod.json   python3 run_sigma_on_tenant.py
SIEM_TOOLKIT_CONFIG=./tenant_lab.json    python3 run_sigma_on_tenant.py

run_sigma_on_tenant.py is a single-shot probe → smoke-test → deploy → PUT-verify, useful when you already have the converted bodies and just want to land them on a new tenant.

Files

File	Role
`recommend_sigma_imports.py`	Reads coverage endpoints, recommends folders + curated rule list
`probe_wel_schema.py`	Discovers WEL parser field schema on the tenant
`convert_test_deploy_sigma.py`	Master pipeline: pick + convert (3 variants) + smoke + `--deploy`
`fixup_rules_6_7.py`	Handles Sigma rules with backend-unsupported keys (e.g. `OriginalFileName`)
`run_sigma_on_tenant.py`	Re-deploys already-converted bodies to another tenant
`verify_rule_exists_via_put.py`	PUT-existence test (definitive when GET is RBAC-blocked)
`verify_deployed_sigma_rules.py`	Probes scope/filter variants to diagnose RBAC
`tenant_config.example.json`	Template — copy to `tenant_config.json` (gitignored)

Where it fits in the SIEM-toolkit story

SIEM-toolkit Threat Coverage map
    │
    ▼
recommend_sigma_imports.py      ──┐
    │  (suggests SigmaHQ folders) │
    ▼                             │
convert_test_deploy_sigma.py      ├── single workflow
    │  (Sigma → PQ → SDL)         │
    ▼                             │
verify_rule_exists_via_put.py   ──┘
    │
    ▼
Activate rules in console UI
    │
    ▼
Re-run SIEM-toolkit Threat Coverage  → firing_pct grows

Pitfalls collected so far

event.type='Process Creation' has near-zero population unless a live S1 EDR agent is reporting; relax variant works around it.
endpoint.os='windows' is null on many tenants; always strip for the relaxed variant.
GitHub anonymous rate limit (60 req/h) kills the listing step — use gh auth login.
Service-user RBAC without cloudDetectionRulesView makes POSTed rules invisible to GET. PUT confirms they exist.
OriginalFileName in Sigma YAML breaks the S1-PQ backend; strip with the pre-processor.
PowerQuery parser quirks — bare * as a query is rejected; comments with /, -, or non-ASCII characters cause Load Failed at rule-validation time even when the body POSTs fine to /api/powerQuery. Keep comments out of any body that will be deployed as a Scheduled rule.

9.5 KiB Raw Permalink Blame History