mirror of https://github.com/marcredhat/SIEM-toolkit-patched synced 2026-06-08 12:33:51 +00:00

T

Mick 2c40bf81ee Cherry-pick improvements from PR #2 (marcredhat)

- s1_client: configurable PowerQuery timeout via SDL_PQ_TIMEOUT env var
  (default 600s, was hardcoded 120s) with separate connect/read timeouts
  via httpx.Timeout; retry on ReadTimeout via SDL_PQ_TIMEOUT_RETRIES;
  better error messages include query snippet and parse non-JSON responses
- ingest: fix simulate-filter SDL syntax (== → =, drop leading | on base
  expression, surface PowerQuery error field, cleaner empty-filter fallback)
- docker-compose: pass SDL_PQ_TIMEOUT and SDL_PQ_TIMEOUT_RETRIES through
  to backend container with sensible defaults

Not taken from PR #2:
- .gitignore parsers/* change — would untrack the 7 committed parser files
- s1_client/quality/coverage changes already present in main from prior work

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-22 10:11:42 -04:00

backend

Cherry-pick improvements from PR #2 (marcredhat)

2026-05-22 10:11:42 -04:00

Initial commit: SIEM Toolkit for SentinelOne

2026-05-19 11:39:26 -04:00

frontend

Add unlabelled event detection, stub parser quality, Sync All, and modern UI redesign

2026-05-22 10:00:21 -04:00

parsers

Add unlabelled event detection, stub parser quality, Sync All, and modern UI redesign

2026-05-22 10:00:21 -04:00

tools

Add helper scripts: SDL parser sync, PQ probes, test-parser smoke tests

2026-05-20 19:41:00 +02:00

.env.example

Add Settings page with .env manager

2026-05-19 11:43:41 -04:00

.gitignore

Auto-load detection library from S1 API, improve coverage map accuracy

2026-05-20 15:14:10 -04:00

build.sh

Initial commit: SIEM Toolkit for SentinelOne

2026-05-19 11:39:26 -04:00

docker-compose.yml

Cherry-pick improvements from PR #2 (marcredhat)

2026-05-22 10:11:42 -04:00

README.md

Auto-load detection library from S1 API, improve coverage map accuracy

2026-05-20 15:14:10 -04:00

README.md

SIEM Toolkit — SentinelOne AI-SIEM

Inspired by Pineapple Boy! 🍍

A self-hosted troubleshooting and visibility tool for SentinelOne AI-SIEM SecOps engineers. Runs as a Docker Compose stack against your SentinelOne demo or production tenant and provides real-time insight into parser coverage, ingest volume, and data quality — all without leaving a single interface.

What's Inside

Page	Purpose
Overview	Live health stats — coverage percentage, active sources, top uncovered sources by volume
Parser Coverage Map	Which active data sources have a parser? Which don't?
Ingest Dashboard	Event volume, top sources, cost projection, filter simulator
Parser Quality	Live event sampler, field population rate, parser test runner
Onboarding Accelerator	Prompt template for onboarding new log sources with Claude Code
Settings	Manage your `.env` credentials directly from the interface

Architecture

browser → nginx (port 3001) → single-page HTML/JS application
                ↓ API calls
          FastAPI backend (port 8001)
                ↓
    ┌───────────────────────────┐
    │  PostgreSQL (SQLAlchemy)  │  parser fields, active sources
    └───────────────────────────┘
                ↓
    ┌───────────────────────────┐
    │  SentinelOne APIs         │
    │  • Management API         │  demo.sentinelone.net
    │  • Scalyr XDR PowerQuery  │  xdr.us1.sentinelone.net
    └───────────────────────────┘

All services run via Docker Compose. The parsers/ directory is volume-mounted into the backend so SDL parser files may be loaded without rebuilding the image.

Setup

1. Clone and Configure

git clone https://github.com/mickbrowns1/SIEM-Toolkit.git
cd SIEM-Toolkit
cp .env.example .env

Edit .env with your credentials:

S1_BASE_URL=https://demo.sentinelone.net       # Your console URL
S1_API_TOKEN=eyJ...                             # Service user API token (account scope or higher)
SDL_XDR_URL=https://xdr.us1.sentinelone.net    # Scalyr XDR endpoint
SDL_LOG_READ_KEY=1j2IU0S...                     # Data Lake read key
ANTHROPIC_API_KEY=                              # Optional — not currently used

S1_API_TOKEN — generate at Settings → Users → Service Users in the console. The service user should be provisioned at account scope or higher.
SDL_LOG_READ_KEY — found at Settings → Integrations → Data Lake API Keys.

2. Add the Detection Library (strongly recommended)

The Detection Fields Missing column and per-source detection counts on the Coverage Map require a local detections export. This is generated from the detection-validator repository.

# Clone the detection-validator repo alongside this one
git clone https://github.com/mickbrowns1/detection-validator.git
cd detection-validator

# Follow its README to generate the export, then copy the output here:
mkdir -p ../SIEM-Toolkit/data
cp data/data/detections/extracted.json ../SIEM-Toolkit/data/detections.json

cd ../SIEM-Toolkit

The data/ directory is gitignored and never committed. Once the stack is running, click Load Detections on the Coverage Map to import the rules into the database.

3. Add Parser Files (optional but strongly recommended)

Place your SDL parser JSON files into the parsers/ directory. The backend reads them directly at query time — no rebuild is necessary.

cp ~/my-parsers/*.json parsers/

4. Start the Stack

docker-compose up -d --build

Open http://localhost:3001 in your browser and you're off.

Features

Overview Dashboard

The landing page gives you an at-a-glance health summary drawn live from the database:

Parser Coverage % — proportion of active sources with a confirmed parser
Active Sources — total number of dataSource.name values seen in the last 7 days
Covered / Need Parser — counts for each status

If any sources are uncovered, the Top Sources Needing a Parser table lists the highest-volume offenders. Click any source name to jump directly to the Parser Quality page with that source pre-selected.

Parser Coverage Map

Answers the question: does each active data source have a parser running?

How it works:

Sync Live Sources — executes a PowerQuery against your data lake to retrieve every dataSource.name seen in the last 7 days, along with event counts.
Load SDL Parsers — reads parser files from parsers/, extracts the dataSource.name attribute from each, and stores the field list in the database.

Matching logic (three-tier):

Exact dataSource.name match between the active source and the parser attribute
Normalised substring match (ignores spaces, dashes, and case) between the active source name and the parser's dataSource.name
Normalised substring match against the parser filename — catches files where the dataSource.name attribute is incorrect or missing

Parser detection from data: During sync, a parallel PowerQuery checks whether each source has events with event.type populated in the data lake. If so, a parser is confirmed as running — the source is marked Covered even without a local parser file. This handles built-in and cloud-managed parsers that are not present in your parsers/ folder.

Status values:

🟢 Covered — custom parser confirmed (local file or detected via parsed events in the data lake)
🔴 Parser Needed — no parser found, or only a grok/dottedJson format (which typically indicates an incomplete parser)

Filters: Use the filter pills to focus on Custom Parser only, Default Parser Only (data lake detected), or No Parser.

Deep link: Click any source name in the table to open it directly in Parser Quality with all dropdowns pre-populated.

Expected results: After syncing sources and loading parsers, sources with active SDL parsers will appear as Covered. Sources sending raw, unparsed data — where only message and timestamp appear in the data lake — will appear as Parser Needed.

Ingest Dashboard

Answers the question: where is my event volume coming from, and what would happen if I filtered some of it?

Time range: 1h (default), 3d, 5d, 7d

Daily Event Volume — bar chart of total events per day. In 1h mode, this switches to a by-source breakdown of the current hour's activity.

Top Sources — a table of the 25 highest-volume dataSource.name values with event count and estimated GB (calculated at 0.5 GB per million events).

Filter Simulator — enter a source name and an optional event type, then press Simulate. The backend runs a live PowerQuery counting matching events and projects:

Matched events in the selected period
Estimated GB that would be saved
Projected monthly events and GB if the filter were applied permanently

This is entirely read-only — no filter is created or applied. Use the results to inform an exclusion rule you apply manually in the console.

Expected results: Top sources should reflect what you see in the SentinelOne console PowerQuery tool. The filter simulator provides a reasonable GB estimate assuming uniform event size across the source.

Parser Quality

Three tools for diagnosing parser extraction failures.

Live Event Sampler

Pulls raw events from a selected source directly from the data lake and renders every field that came back. The message column is pinned to the right of the table, with a ⎘ copy button on each row for convenient extraction of raw log lines.

Empty fields are displayed as ∅ in grey — immediately highlighting fields the parser is failing to populate
Healthy source: many fields populated (src.ip, user.name, event.type, etc.), with message present as the raw log backup
Unhealthy source: only timestamp and message populated — the parser is not extracting anything of value

Field Population Rate

Samples up to 500 events from a source and measures what percentage of them have each field populated. Results are sorted worst-first so the most pressing gaps are immediately visible.

When you select a source, the tool automatically discovers which fields exist in that source's events and pre-fills the field list — merged with SDL schema defaults. The list is fully editable before running the analysis.

Colour coding:

🟢 ≥ 80% — healthy extraction
🟡 40–79% — partial extraction; check your regex patterns
🔴 < 40% — field is rarely populated; the parser is likely not matching this log format variant

Healthy parser: Key fields such as src.ip, event.type, and user.name should sit between 70–100%. Niche fields like src.process.cmdline or tgt.file.path will naturally be lower, as not every event type produces them.

Broken parser: All SDL fields at 0%, with only timestamp and message visible in the "fields seen in sample" chip list at the bottom of the results.

Parser Test Runner

Paste a raw log line, select a loaded parser, and press Test. The backend extracts SDL $field=pattern$ format strings from the parser file, converts them to Python named-group regular expressions, and tries each against your log line.

Matched: displays the format string that matched and every field extracted with its value
No match: none of the parser's format strings apply to this log line — the log may contain a format variant the parser does not yet cover

Note: Only parsers using SDL custom format strings are supported by the test runner. Grok and dottedJson parsers are not currently testable here.

Onboarding Accelerator

A prompt template for using Claude Code to onboard a new log source. Copy the template, paste a sample of raw log lines, and Claude Code will generate:

An SDL parser skeleton in augmented-JSON format
Field mappings to the SDL common schema
Parser test assertions

No Anthropic API key is required — this uses Claude Code directly from your terminal.

Settings

Read and write your .env credentials from the interface. Secret fields (API tokens, keys) are masked by default with a show/hide toggle. Changes are written to the mounted .env file and take effect after restarting the backend:

docker-compose up -d --build backend

Rebuilding

# Full rebuild
docker-compose up -d --build

# Backend only (after Python changes)
docker-compose up -d --build backend

# Frontend only (after HTML/JS changes)
docker-compose up -d --build frontend

# Reset the database
curl -X DELETE http://localhost:8001/api/coverage/reset

Project Layout

.
├── backend/
│   ├── main.py                  # FastAPI application, router registration
│   ├── db.py                    # SQLAlchemy models
│   ├── routers/
│   │   ├── coverage.py          # Parser coverage map endpoints
│   │   ├── ingest.py            # Ingest dashboard + filter simulator
│   │   ├── quality.py           # Parser quality tools
│   │   └── settings.py          # .env read/write
│   └── services/
│       ├── s1_client.py         # SentinelOne + Scalyr API client
│       └── rule_parser.py       # SDL format string field extraction
├── frontend/
│   └── index.html               # Single-page application (Tailwind, vanilla JS)
├── parsers/                     # SDL parser files (volume-mounted)
├── db/
│   └── init.sql                 # Postgres initialisation (tables created by SQLAlchemy)
├── docker-compose.yml
├── .env.example
└── README.md

Notes

The backend queries your demo tenant (demo.sentinelone.net) — not usea1-purple or any other tenant. Ensure your S1_BASE_URL and SDL_LOG_READ_KEY are pointed at the same tenant.
Parser files in parsers/ are read at query time, not on startup — add or update files at any point without rebuilding the image.
The filter simulator is entirely read-only and makes no changes whatsoever to your tenant configuration.
The service user API token must be at account scope or higher. Site-scoped tokens will have limited visibility into rules and may see reduced source counts.

Languages

Python 59.7%

HTML 31.4%

TypeScript 6.4%

Shell 2.3%

JavaScript 0.1%

README.md Unescape Escape

SIEM Toolkit — SentinelOne AI-SIEM

What's Inside

Architecture

Setup

1. Clone and Configure

2. Add the Detection Library (strongly recommended)

3. Add Parser Files (optional but strongly recommended)

4. Start the Stack

Features

Overview Dashboard

Parser Coverage Map

Ingest Dashboard

Parser Quality

Live Event Sampler

Field Population Rate

Parser Test Runner

Onboarding Accelerator

Settings

Rebuilding

Project Layout

Notes

README.md