Filters are now: All | Custom Parser | Default Parser Only | No Parser - Custom Parser: covered sources with a loaded SDL parser file - Default Parser Only: covered via event.type detection in data lake but no custom parser file — built-in or cloud-managed parser running - No Parser: parser_needed sources (no parser found at all) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SIEM Toolkit — SentinelOne AI-SIEM
Inspired by Pineapple Boy! 🍍
A self-hosted troubleshooting and visibility tool for SentinelOne AI-SIEM SecOps engineers. Runs as a Docker Compose stack against your SentinelOne demo or production tenant and provides real-time insight into parser coverage, ingest volume, and data quality — all without leaving a single interface.
What's Inside
| Page | Purpose |
|---|---|
| Parser Coverage Map | Which active data sources have a parser? Which don't? |
| Ingest Dashboard | Event volume, top sources, cost projection, filter simulator |
| Parser Quality | Live event sampler, field population rate, parser test runner |
| Onboarding Accelerator | Prompt template for onboarding new log sources with Claude Code |
| Settings | Manage your .env credentials directly from the interface |
Architecture
browser → nginx (port 3001) → single-page HTML/JS application
↓ API calls
FastAPI backend (port 8001)
↓
┌───────────────────────────┐
│ PostgreSQL (SQLAlchemy) │ parsed rules, parser fields, active sources
└───────────────────────────┘
↓
┌───────────────────────────┐
│ SentinelOne APIs │
│ • Management API (STAR) │ demo.sentinelone.net
│ • Scalyr XDR PowerQuery │ xdr.us1.sentinelone.net
└───────────────────────────┘
All services run via Docker Compose. The parsers/ directory is volume-mounted into the backend so SDL parser files may be loaded without rebuilding the image.
Setup
1. Clone and Configure
git clone https://github.com/mickbrowns1/SIEM-Toolkit.git
cd SIEM-Toolkit
cp .env.example .env
Edit .env with your credentials:
S1_BASE_URL=https://demo.sentinelone.net # Your console URL
S1_API_TOKEN=eyJ... # Service user API token
SDL_XDR_URL=https://xdr.us1.sentinelone.net # Scalyr XDR endpoint
SDL_LOG_READ_KEY=1j2IU0S... # Data Lake read key
ANTHROPIC_API_KEY= # Optional — Onboarding page only
S1_API_TOKEN — generate at Settings → Users → Service Users in the console.
SDL_LOG_READ_KEY — found at Settings → Integrations → Data Lake API Keys.
2. Add Parser Files (optional but strongly recommended)
Place your SDL parser JSON files into the parsers/ directory. The backend reads them directly at query time — no rebuild is necessary.
cp ~/my-parsers/*.json parsers/
3. Start the Stack
docker-compose up -d --build
Open http://localhost:3001 in your browser and you're off.
Features
Parser Coverage Map
Answers the question: does each active data source have a parser running?
How it works:
- Sync Live Sources — executes a PowerQuery against your data lake to retrieve every
dataSource.nameseen in the last 7 days, along with event counts. - Load SDL Parsers — reads parser files from
parsers/, extracts thedataSource.nameattribute from each, and stores the field list in the database. - Load STAR Rules — retrieves your STAR detection rules from the management API and indexes which data sources each rule references.
Matching logic (three-tier):
- Exact
dataSource.namematch between the active source and the parser attribute - Normalised substring match (ignores spaces, dashes, and case) between the active source name and the parser's
dataSource.name - Normalised substring match against the parser filename — catches files where the
dataSource.nameattribute is incorrect or missing
Parser detection from data: During sync, a parallel PowerQuery checks whether each source has events with event.type populated in the data lake. If so, a parser is confirmed as running — the source is marked Covered even without a local parser file. This handles built-in and cloud-managed parsers that are not present in your parsers/ folder.
Status values:
- 🟢 Covered — custom parser confirmed (local file or detected via parsed events in the data lake)
- 🔴 Parser Needed — no parser found, or only a grok/dottedJson format (which typically indicates an incomplete parser)
Expected results: After syncing sources and loading parsers, sources with active SDL parsers will appear as Covered. Sources sending raw, unparsed data — where only message and timestamp appear in the data lake — will appear as Parser Needed.
Ingest Dashboard
Answers the question: where is my event volume coming from, and what would happen if I filtered some of it?
Time range: 1h (default), 3d, 5d, 7d
Daily Event Volume — bar chart of total events per day. In 1h mode, this switches to a by-source breakdown of the current hour's activity.
Top Sources — a table of the 25 highest-volume dataSource.name values with event count and estimated GB (calculated at 0.5 GB per million events).
Filter Simulator — enter a source name and an optional event type, then press Simulate. The backend runs a live PowerQuery counting matching events and projects:
- Matched events in the selected period
- Estimated GB that would be saved
- Projected monthly events and GB if the filter were applied permanently
This is entirely read-only — no filter is created or applied. Use the results to inform an exclusion rule you apply manually in the console.
Expected results: Top sources should reflect what you see in the SentinelOne console PowerQuery tool. The filter simulator provides a reasonable GB estimate assuming uniform event size across the source.
Parser Quality
Three tools for diagnosing parser extraction failures.
Live Event Sampler
Pulls raw events from a selected source directly from the data lake and renders every field that came back. The message column is pinned to the right of the table, with a ⎘ copy button on each row for convenient extraction of raw log lines.
- Empty fields are displayed as
∅in grey — immediately highlighting fields the parser is failing to populate - Healthy source: many fields populated (
src.ip,user.name,event.type, etc.), withmessagepresent as the raw log backup - Unhealthy source: only
timestampandmessagepopulated — the parser is not extracting anything of value
Field Population Rate
Samples up to 500 events from a source and measures what percentage of them have each field populated. Results are sorted worst-first so the most pressing gaps are immediately visible.
When you select a source, the tool automatically discovers which fields exist in that source's events and pre-fills the field list — merged with SDL schema defaults. The list is fully editable before running the analysis.
Colour coding:
- 🟢 ≥ 80% — healthy extraction
- 🟡 40–79% — partial extraction; check your regex patterns
- 🔴 < 40% — field is rarely populated; the parser is likely not matching this log format variant
Healthy parser: Key fields such as src.ip, event.type, and user.name should sit between 70–100%. Niche fields like src.process.cmdline or tgt.file.path will naturally be lower, as not every event type produces them.
Broken parser: All SDL fields at 0%, with only timestamp and message visible in the "fields seen in sample" chip list at the bottom of the results.
Parser Test Runner
Paste a raw log line, select a loaded parser, and press Test. The backend extracts SDL $field=pattern$ format strings from the parser file, converts them to Python named-group regular expressions, and tries each against your log line.
- Matched: displays the format string that matched and every field extracted with its value
- No match: none of the parser's format strings apply to this log line — the log may contain a format variant the parser does not yet cover
Note: Only parsers using SDL custom format strings are supported by the test runner. Grok and dottedJson parsers are not currently testable here.
Onboarding Accelerator
A prompt template for using Claude Code to onboard a new log source. Copy the template, paste a sample of raw log lines, and Claude Code will generate:
- An SDL parser skeleton in augmented-JSON format
- Field mappings to the SDL common schema
- 2–3 starter STAR detection rules
- 5 parser test assertions
No Anthropic API key is required — this uses Claude Code directly from your terminal.
Settings
Read and write your .env credentials from the interface. Secret fields (API tokens, keys) are masked by default with a show/hide toggle. Changes are written to the mounted .env file and take effect after restarting the backend:
docker-compose up -d --build backend
Rebuilding
# Full rebuild
docker-compose up -d --build
# Backend only (after Python changes)
docker-compose up -d --build backend
# Frontend only (after HTML/JS changes)
docker-compose up -d --build frontend
# Reset the database
curl -X DELETE http://localhost:8001/api/coverage/reset
Project Layout
.
├── backend/
│ ├── main.py # FastAPI application, router registration
│ ├── db.py # SQLAlchemy models
│ ├── routers/
│ │ ├── coverage.py # Parser coverage map endpoints
│ │ ├── ingest.py # Ingest dashboard + filter simulator
│ │ ├── quality.py # Parser quality tools
│ │ └── settings.py # .env read/write
│ └── services/
│ ├── s1_client.py # SentinelOne + Scalyr API client
│ └── rule_parser.py # SDL/Sigma/STAR field extraction
├── frontend/
│ └── index.html # Single-page application (Tailwind, vanilla JS)
├── parsers/ # SDL parser files (volume-mounted)
├── db/
│ └── init.sql # Postgres initialisation (tables created by SQLAlchemy)
├── docker-compose.yml
├── .env.example
└── README.md
Notes
- The backend queries your demo tenant (
demo.sentinelone.net) — not usea1-purple or any other tenant. Ensure yourS1_BASE_URLandSDL_LOG_READ_KEYare pointed at the same tenant. - Parser files in
parsers/are read at query time, not on startup — add or update files at any point without rebuilding the image. - The filter simulator is entirely read-only and makes no changes whatsoever to your tenant configuration.