mirror of
https://github.com/marcredhat/SIEM-toolkit-patched
synced 2026-06-08 12:33:51 +00:00
Rewrite README in the Queen's English, inspired by Pineapple Boy
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,10 +1,12 @@
|
|||||||
# SIEM Toolkit — SentinelOne AI-SIEM
|
# SIEM Toolkit — SentinelOne AI-SIEM
|
||||||
|
|
||||||
A self-hosted troubleshooting and visibility tool for SentinelOne AI-SIEM SecOps engineers. Runs as a Docker Compose stack against your SentinelOne demo or production tenant and gives you real-time insight into parser coverage, ingest volume, and data quality without leaving a single UI.
|
> *Inspired by Pineapple Boy!* 🍍
|
||||||
|
|
||||||
|
A self-hosted troubleshooting and visibility tool for SentinelOne AI-SIEM SecOps engineers. Runs as a Docker Compose stack against your SentinelOne demo or production tenant and provides real-time insight into parser coverage, ingest volume, and data quality — all without leaving a single interface.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## What's inside
|
## What's Inside
|
||||||
|
|
||||||
| Page | Purpose |
|
| Page | Purpose |
|
||||||
|---|---|
|
|---|---|
|
||||||
@@ -12,14 +14,14 @@ A self-hosted troubleshooting and visibility tool for SentinelOne AI-SIEM SecOps
|
|||||||
| **Ingest Dashboard** | Event volume, top sources, cost projection, filter simulator |
|
| **Ingest Dashboard** | Event volume, top sources, cost projection, filter simulator |
|
||||||
| **Parser Quality** | Live event sampler, field population rate, parser test runner |
|
| **Parser Quality** | Live event sampler, field population rate, parser test runner |
|
||||||
| **Onboarding Accelerator** | Prompt template for onboarding new log sources with Claude Code |
|
| **Onboarding Accelerator** | Prompt template for onboarding new log sources with Claude Code |
|
||||||
| **Settings** | Manage your `.env` credentials from the UI |
|
| **Settings** | Manage your `.env` credentials directly from the interface |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
```
|
```
|
||||||
browser → nginx (port 3001) → single-page HTML/JS app
|
browser → nginx (port 3001) → single-page HTML/JS application
|
||||||
↓ API calls
|
↓ API calls
|
||||||
FastAPI backend (port 8001)
|
FastAPI backend (port 8001)
|
||||||
↓
|
↓
|
||||||
@@ -34,13 +36,13 @@ browser → nginx (port 3001) → single-page HTML/JS app
|
|||||||
└───────────────────────────┘
|
└───────────────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
All services run via Docker Compose. The `parsers/` directory is volume-mounted into the backend so SDL parser files can be loaded without rebuilding the image.
|
All services run via Docker Compose. The `parsers/` directory is volume-mounted into the backend so SDL parser files may be loaded without rebuilding the image.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Setup
|
## Setup
|
||||||
|
|
||||||
### 1. Clone and configure
|
### 1. Clone and Configure
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/mickbrowns1/SIEM-Toolkit.git
|
git clone https://github.com/mickbrowns1/SIEM-Toolkit.git
|
||||||
@@ -61,21 +63,21 @@ ANTHROPIC_API_KEY= # Optional — Onboarding page o
|
|||||||
**S1_API_TOKEN** — generate at *Settings → Users → Service Users* in the console.
|
**S1_API_TOKEN** — generate at *Settings → Users → Service Users* in the console.
|
||||||
**SDL_LOG_READ_KEY** — found at *Settings → Integrations → Data Lake API Keys*.
|
**SDL_LOG_READ_KEY** — found at *Settings → Integrations → Data Lake API Keys*.
|
||||||
|
|
||||||
### 2. Add parser files (optional but recommended)
|
### 2. Add Parser Files (optional but strongly recommended)
|
||||||
|
|
||||||
Drop SDL parser JSON files into `parsers/`. The backend reads them directly — no rebuild needed.
|
Place your SDL parser JSON files into the `parsers/` directory. The backend reads them directly at query time — no rebuild is necessary.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cp ~/my-parsers/*.json parsers/
|
cp ~/my-parsers/*.json parsers/
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. Start the stack
|
### 3. Start the Stack
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker-compose up -d --build
|
docker-compose up -d --build
|
||||||
```
|
```
|
||||||
|
|
||||||
Open **http://localhost:3001** in your browser.
|
Open **http://localhost:3001** in your browser and you're off.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -83,47 +85,47 @@ Open **http://localhost:3001** in your browser.
|
|||||||
|
|
||||||
### Parser Coverage Map
|
### Parser Coverage Map
|
||||||
|
|
||||||
Answers: *does each active data source have a parser running?*
|
Answers the question: *does each active data source have a parser running?*
|
||||||
|
|
||||||
**How it works:**
|
**How it works:**
|
||||||
|
|
||||||
1. **Sync Live Sources** — runs a PowerQuery against your data lake to pull every `dataSource.name` seen in the last 7 days, along with event counts.
|
1. **Sync Live Sources** — executes a PowerQuery against your data lake to retrieve every `dataSource.name` seen in the last 7 days, along with event counts.
|
||||||
2. **Load SDL Parsers** — reads parser files from `parsers/`, extracts the `dataSource.name` attribute from each, and stores the field list.
|
2. **Load SDL Parsers** — reads parser files from `parsers/`, extracts the `dataSource.name` attribute from each, and stores the field list in the database.
|
||||||
3. **Load STAR Rules** — pulls your STAR detection rules from the management API and indexes which data sources each rule references.
|
3. **Load STAR Rules** — retrieves your STAR detection rules from the management API and indexes which data sources each rule references.
|
||||||
|
|
||||||
**Matching logic (three-tier):**
|
**Matching logic (three-tier):**
|
||||||
1. Exact `dataSource.name` match between active source and parser attribute
|
1. Exact `dataSource.name` match between the active source and the parser attribute
|
||||||
2. Normalized substring match (ignores spaces, dashes, case) between active source name and parser's `dataSource.name`
|
2. Normalised substring match (ignores spaces, dashes, and case) between the active source name and the parser's `dataSource.name`
|
||||||
3. Normalized substring match against the parser filename — catches files where the `dataSource.name` attribute is wrong or missing
|
3. Normalised substring match against the parser filename — catches files where the `dataSource.name` attribute is incorrect or missing
|
||||||
|
|
||||||
**Parser detection from data:** During sync, a parallel PowerQuery checks whether each source has events with `event.type` populated in the data lake. If yes, a parser is confirmed running — the source is marked **Covered** even without a local parser file. This handles built-in and cloud-managed parsers that aren't in your `parsers/` folder.
|
**Parser detection from data:** During sync, a parallel PowerQuery checks whether each source has events with `event.type` populated in the data lake. If so, a parser is confirmed as running — the source is marked **Covered** even without a local parser file. This handles built-in and cloud-managed parsers that are not present in your `parsers/` folder.
|
||||||
|
|
||||||
**Status values:**
|
**Status values:**
|
||||||
- 🟢 **Covered** — custom parser confirmed (local file or detected via parsed events in data)
|
- 🟢 **Covered** — custom parser confirmed (local file or detected via parsed events in the data lake)
|
||||||
- 🔴 **Parser Needed** — no parser found, or only a grok/dottedJson format (which typically signals an incomplete parser)
|
- 🔴 **Parser Needed** — no parser found, or only a grok/dottedJson format (which typically indicates an incomplete parser)
|
||||||
|
|
||||||
**Expected results:** After syncing sources and loading parsers, sources with active SDL parsers show as Covered. Sources sending raw unparsed data (only `message` and `timestamp` in the data lake) show as Parser Needed.
|
**Expected results:** After syncing sources and loading parsers, sources with active SDL parsers will appear as Covered. Sources sending raw, unparsed data — where only `message` and `timestamp` appear in the data lake — will appear as Parser Needed.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### Ingest Dashboard
|
### Ingest Dashboard
|
||||||
|
|
||||||
Answers: *where is my event volume coming from, and what would happen if I filtered some of it?*
|
Answers the question: *where is my event volume coming from, and what would happen if I filtered some of it?*
|
||||||
|
|
||||||
**Time range:** 1h (default), 3d, 5d, 7d
|
**Time range:** 1h (default), 3d, 5d, 7d
|
||||||
|
|
||||||
**Daily Event Volume** — bar chart of total events per day. In 1h mode, switches to a by-source breakdown of the current hour.
|
**Daily Event Volume** — bar chart of total events per day. In 1h mode, this switches to a by-source breakdown of the current hour's activity.
|
||||||
|
|
||||||
**Top Sources** — table of the 25 highest-volume `dataSource.name` values with event count and estimated GB (based on 0.5 GB per million events).
|
**Top Sources** — a table of the 25 highest-volume `dataSource.name` values with event count and estimated GB (calculated at 0.5 GB per million events).
|
||||||
|
|
||||||
**Filter Simulator** — enter a source name and optional event type, hit Simulate. The backend runs a live PowerQuery counting matching events and projects:
|
**Filter Simulator** — enter a source name and an optional event type, then press Simulate. The backend runs a live PowerQuery counting matching events and projects:
|
||||||
- Matched events in the period
|
- Matched events in the selected period
|
||||||
- Estimated GB saved in the period
|
- Estimated GB that would be saved
|
||||||
- Projected monthly events and GB if the filter were applied
|
- Projected monthly events and GB if the filter were applied permanently
|
||||||
|
|
||||||
This is read-only — no filter is created. Use the results to inform an exclusion rule you apply manually in the console.
|
This is entirely read-only — no filter is created or applied. Use the results to inform an exclusion rule you apply manually in the console.
|
||||||
|
|
||||||
**Expected results:** Top sources reflect what you see in the SentinelOne console PowerQuery. The filter simulator gives a reasonable GB estimate assuming uniform event size.
|
**Expected results:** Top sources should reflect what you see in the SentinelOne console PowerQuery tool. The filter simulator provides a reasonable GB estimate assuming uniform event size across the source.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -133,35 +135,35 @@ Three tools for diagnosing parser extraction failures.
|
|||||||
|
|
||||||
#### Live Event Sampler
|
#### Live Event Sampler
|
||||||
|
|
||||||
Pulls raw events from a selected source directly from the data lake and renders every field that came back. The `message` column is pinned to the right and has a **⎘ copy** button on each row for quick extraction.
|
Pulls raw events from a selected source directly from the data lake and renders every field that came back. The `message` column is pinned to the right of the table, with a **⎘ copy** button on each row for convenient extraction of raw log lines.
|
||||||
|
|
||||||
- **Empty fields** show as `∅` in gray — immediately highlights fields the parser isn't populating
|
- **Empty fields** are displayed as `∅` in grey — immediately highlighting fields the parser is failing to populate
|
||||||
- **Expected result on a healthy source:** Many fields populated (`src.ip`, `user.name`, `event.type`, etc.), `message` present as raw log backup
|
- **Healthy source:** many fields populated (`src.ip`, `user.name`, `event.type`, etc.), with `message` present as the raw log backup
|
||||||
- **Expected result on an unhealthy source:** Only `timestamp` and `message` populated — the parser isn't extracting anything
|
- **Unhealthy source:** only `timestamp` and `message` populated — the parser is not extracting anything of value
|
||||||
|
|
||||||
#### Field Population Rate
|
#### Field Population Rate
|
||||||
|
|
||||||
Samples up to 500 events from a source and measures what percentage of them have each field populated. Sorted worst-first.
|
Samples up to 500 events from a source and measures what percentage of them have each field populated. Results are sorted worst-first so the most pressing gaps are immediately visible.
|
||||||
|
|
||||||
When you select a source, the tool auto-discovers what fields exist in that source's events and pre-fills the field list — merged with SDL schema defaults. You can edit the list before running.
|
When you select a source, the tool automatically discovers which fields exist in that source's events and pre-fills the field list — merged with SDL schema defaults. The list is fully editable before running the analysis.
|
||||||
|
|
||||||
**Colour coding:**
|
**Colour coding:**
|
||||||
- 🟢 ≥ 80% — healthy extraction
|
- 🟢 ≥ 80% — healthy extraction
|
||||||
- 🟡 40–79% — partial extraction, check regex patterns
|
- 🟡 40–79% — partial extraction; check your regex patterns
|
||||||
- 🔴 < 40% — field is rarely populated; parser likely not matching this log format
|
- 🔴 < 40% — field is rarely populated; the parser is likely not matching this log format variant
|
||||||
|
|
||||||
**Expected result on a working parser:** Key fields like `src.ip`, `event.type`, `user.name` should be 70–100%. Niche fields like `src.process.cmdline` or `tgt.file.path` will naturally be lower (not every event type produces them).
|
**Healthy parser:** Key fields such as `src.ip`, `event.type`, and `user.name` should sit between 70–100%. Niche fields like `src.process.cmdline` or `tgt.file.path` will naturally be lower, as not every event type produces them.
|
||||||
|
|
||||||
**Expected result on a broken parser:** All SDL fields at 0%, only `timestamp` and `message` visible in the "fields seen in sample" chip list at the bottom.
|
**Broken parser:** All SDL fields at 0%, with only `timestamp` and `message` visible in the "fields seen in sample" chip list at the bottom of the results.
|
||||||
|
|
||||||
#### Parser Test Runner
|
#### Parser Test Runner
|
||||||
|
|
||||||
Paste a raw log line, select a loaded parser, hit Test. The backend extracts SDL `$field=pattern$` format strings from the parser file, converts them to Python named-group regex, and tries each against your log line.
|
Paste a raw log line, select a loaded parser, and press Test. The backend extracts SDL `$field=pattern$` format strings from the parser file, converts them to Python named-group regular expressions, and tries each against your log line.
|
||||||
|
|
||||||
- **Matched:** shows the format string that matched and every field extracted with its value
|
- **Matched:** displays the format string that matched and every field extracted with its value
|
||||||
- **No match:** means none of the parser's format strings apply to this log line — the log may have a format variant the parser doesn't cover
|
- **No match:** none of the parser's format strings apply to this log line — the log may contain a format variant the parser does not yet cover
|
||||||
|
|
||||||
> Note: only parsers using SDL custom format strings are testable here. Grok and dottedJson parsers are not currently supported by the test runner.
|
> **Note:** Only parsers using SDL custom format strings are supported by the test runner. Grok and dottedJson parsers are not currently testable here.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -174,13 +176,13 @@ A prompt template for using Claude Code to onboard a new log source. Copy the te
|
|||||||
- 2–3 starter STAR detection rules
|
- 2–3 starter STAR detection rules
|
||||||
- 5 parser test assertions
|
- 5 parser test assertions
|
||||||
|
|
||||||
No Anthropic API key required — this uses Claude Code directly.
|
No Anthropic API key is required — this uses Claude Code directly from your terminal.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### Settings
|
### Settings
|
||||||
|
|
||||||
Read and write your `.env` credentials from the UI. Secret fields (API tokens, keys) are masked by default with show/hide toggle. Changes are written to the mounted `.env` file and take effect after restarting the backend:
|
Read and write your `.env` credentials from the interface. Secret fields (API tokens, keys) are masked by default with a show/hide toggle. Changes are written to the mounted `.env` file and take effect after restarting the backend:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker-compose up -d --build backend
|
docker-compose up -d --build backend
|
||||||
@@ -206,12 +208,12 @@ curl -X DELETE http://localhost:8001/api/coverage/reset
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Project layout
|
## Project Layout
|
||||||
|
|
||||||
```
|
```
|
||||||
.
|
.
|
||||||
├── backend/
|
├── backend/
|
||||||
│ ├── main.py # FastAPI app, router registration
|
│ ├── main.py # FastAPI application, router registration
|
||||||
│ ├── db.py # SQLAlchemy models
|
│ ├── db.py # SQLAlchemy models
|
||||||
│ ├── routers/
|
│ ├── routers/
|
||||||
│ │ ├── coverage.py # Parser coverage map endpoints
|
│ │ ├── coverage.py # Parser coverage map endpoints
|
||||||
@@ -222,10 +224,10 @@ curl -X DELETE http://localhost:8001/api/coverage/reset
|
|||||||
│ ├── s1_client.py # SentinelOne + Scalyr API client
|
│ ├── s1_client.py # SentinelOne + Scalyr API client
|
||||||
│ └── rule_parser.py # SDL/Sigma/STAR field extraction
|
│ └── rule_parser.py # SDL/Sigma/STAR field extraction
|
||||||
├── frontend/
|
├── frontend/
|
||||||
│ └── index.html # Single-page app (Tailwind, vanilla JS)
|
│ └── index.html # Single-page application (Tailwind, vanilla JS)
|
||||||
├── parsers/ # SDL parser files (volume-mounted)
|
├── parsers/ # SDL parser files (volume-mounted)
|
||||||
├── db/
|
├── db/
|
||||||
│ └── init.sql # Postgres init (tables created by SQLAlchemy)
|
│ └── init.sql # Postgres initialisation (tables created by SQLAlchemy)
|
||||||
├── docker-compose.yml
|
├── docker-compose.yml
|
||||||
├── .env.example
|
├── .env.example
|
||||||
└── README.md
|
└── README.md
|
||||||
@@ -235,6 +237,6 @@ curl -X DELETE http://localhost:8001/api/coverage/reset
|
|||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- The backend queries your **demo tenant** (`demo.sentinelone.net`) — not usea1-purple or any other tenant. Keep your `S1_BASE_URL` and `SDL_LOG_READ_KEY` pointed at the same tenant.
|
- The backend queries your **demo tenant** (`demo.sentinelone.net`) — not usea1-purple or any other tenant. Ensure your `S1_BASE_URL` and `SDL_LOG_READ_KEY` are pointed at the same tenant.
|
||||||
- Parser files in `parsers/` are read at query time, not on startup — add or update files without rebuilding.
|
- Parser files in `parsers/` are read at query time, not on startup — add or update files at any point without rebuilding the image.
|
||||||
- The filter simulator is read-only and makes no changes to your tenant configuration.
|
- The filter simulator is entirely read-only and makes no changes whatsoever to your tenant configuration.
|
||||||
|
|||||||
Reference in New Issue
Block a user