diff --git a/.gitignore b/.gitignore index 84e80c6..974e1ac 100644 --- a/.gitignore +++ b/.gitignore @@ -7,3 +7,4 @@ node_modules/ frontend/out/ pgdata/ parsers/*.json +data/ diff --git a/README.md b/README.md index 2a48591..c769b3e 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,7 @@ A self-hosted troubleshooting and visibility tool for SentinelOne AI-SIEM SecOps | Page | Purpose | |---|---| +| **Overview** | Live health stats — coverage percentage, active sources, top uncovered sources by volume | | **Parser Coverage Map** | Which active data sources have a parser? Which don't? | | **Ingest Dashboard** | Event volume, top sources, cost projection, filter simulator | | **Parser Quality** | Live event sampler, field population rate, parser test runner | @@ -26,12 +27,12 @@ browser → nginx (port 3001) → single-page HTML/JS application FastAPI backend (port 8001) ↓ ┌───────────────────────────┐ - │ PostgreSQL (SQLAlchemy) │ parsed rules, parser fields, active sources + │ PostgreSQL (SQLAlchemy) │ parser fields, active sources └───────────────────────────┘ ↓ ┌───────────────────────────┐ │ SentinelOne APIs │ - │ • Management API (STAR) │ demo.sentinelone.net + │ • Management API │ demo.sentinelone.net │ • Scalyr XDR PowerQuery │ xdr.us1.sentinelone.net └───────────────────────────┘ ``` @@ -54,16 +55,34 @@ Edit `.env` with your credentials: ```env S1_BASE_URL=https://demo.sentinelone.net # Your console URL -S1_API_TOKEN=eyJ... # Service user API token +S1_API_TOKEN=eyJ... # Service user API token (account scope or higher) SDL_XDR_URL=https://xdr.us1.sentinelone.net # Scalyr XDR endpoint SDL_LOG_READ_KEY=1j2IU0S... # Data Lake read key -ANTHROPIC_API_KEY= # Optional — Onboarding page only +ANTHROPIC_API_KEY= # Optional — not currently used ``` -**S1_API_TOKEN** — generate at *Settings → Users → Service Users* in the console. +**S1_API_TOKEN** — generate at *Settings → Users → Service Users* in the console. The service user should be provisioned at **account scope** or higher. **SDL_LOG_READ_KEY** — found at *Settings → Integrations → Data Lake API Keys*. -### 2. Add Parser Files (optional but strongly recommended) +### 2. Add the Detection Library (strongly recommended) + +The Detection Fields Missing column and per-source detection counts on the Coverage Map require a local detections export. This is generated from the [detection-validator](https://github.com/mickbrowns1/detection-validator) repository. + +```bash +# Clone the detection-validator repo alongside this one +git clone https://github.com/mickbrowns1/detection-validator.git +cd detection-validator + +# Follow its README to generate the export, then copy the output here: +mkdir -p ../SIEM-Toolkit/data +cp data/data/detections/extracted.json ../SIEM-Toolkit/data/detections.json + +cd ../SIEM-Toolkit +``` + +The `data/` directory is gitignored and never committed. Once the stack is running, click **Load Detections** on the Coverage Map to import the rules into the database. + +### 3. Add Parser Files (optional but strongly recommended) Place your SDL parser JSON files into the `parsers/` directory. The backend reads them directly at query time — no rebuild is necessary. @@ -71,7 +90,7 @@ Place your SDL parser JSON files into the `parsers/` directory. The backend read cp ~/my-parsers/*.json parsers/ ``` -### 3. Start the Stack +### 4. Start the Stack ```bash docker-compose up -d --build @@ -83,6 +102,18 @@ Open **http://localhost:3001** in your browser and you're off. ## Features +### Overview Dashboard + +The landing page gives you an at-a-glance health summary drawn live from the database: + +- **Parser Coverage %** — proportion of active sources with a confirmed parser +- **Active Sources** — total number of `dataSource.name` values seen in the last 7 days +- **Covered / Need Parser** — counts for each status + +If any sources are uncovered, the **Top Sources Needing a Parser** table lists the highest-volume offenders. Click any source name to jump directly to the Parser Quality page with that source pre-selected. + +--- + ### Parser Coverage Map Answers the question: *does each active data source have a parser running?* @@ -91,7 +122,6 @@ Answers the question: *does each active data source have a parser running?* 1. **Sync Live Sources** — executes a PowerQuery against your data lake to retrieve every `dataSource.name` seen in the last 7 days, along with event counts. 2. **Load SDL Parsers** — reads parser files from `parsers/`, extracts the `dataSource.name` attribute from each, and stores the field list in the database. -3. **Load STAR Rules** — retrieves your STAR detection rules from the management API and indexes which data sources each rule references. **Matching logic (three-tier):** 1. Exact `dataSource.name` match between the active source and the parser attribute @@ -104,6 +134,10 @@ Answers the question: *does each active data source have a parser running?* - 🟢 **Covered** — custom parser confirmed (local file or detected via parsed events in the data lake) - 🔴 **Parser Needed** — no parser found, or only a grok/dottedJson format (which typically indicates an incomplete parser) +**Filters:** Use the filter pills to focus on Custom Parser only, Default Parser Only (data lake detected), or No Parser. + +**Deep link:** Click any source name in the table to open it directly in Parser Quality with all dropdowns pre-populated. + **Expected results:** After syncing sources and loading parsers, sources with active SDL parsers will appear as Covered. Sources sending raw, unparsed data — where only `message` and `timestamp` appear in the data lake — will appear as Parser Needed. --- @@ -173,8 +207,7 @@ A prompt template for using Claude Code to onboard a new log source. Copy the te - An SDL parser skeleton in augmented-JSON format - Field mappings to the SDL common schema -- 2–3 starter STAR detection rules -- 5 parser test assertions +- Parser test assertions No Anthropic API key is required — this uses Claude Code directly from your terminal. @@ -222,7 +255,7 @@ curl -X DELETE http://localhost:8001/api/coverage/reset │ │ └── settings.py # .env read/write │ └── services/ │ ├── s1_client.py # SentinelOne + Scalyr API client -│ └── rule_parser.py # SDL/Sigma/STAR field extraction +│ └── rule_parser.py # SDL format string field extraction ├── frontend/ │ └── index.html # Single-page application (Tailwind, vanilla JS) ├── parsers/ # SDL parser files (volume-mounted) @@ -240,3 +273,4 @@ curl -X DELETE http://localhost:8001/api/coverage/reset - The backend queries your **demo tenant** (`demo.sentinelone.net`) — not usea1-purple or any other tenant. Ensure your `S1_BASE_URL` and `SDL_LOG_READ_KEY` are pointed at the same tenant. - Parser files in `parsers/` are read at query time, not on startup — add or update files at any point without rebuilding the image. - The filter simulator is entirely read-only and makes no changes whatsoever to your tenant configuration. +- The service user API token must be at **account scope** or higher. Site-scoped tokens will have limited visibility into rules and may see reduced source counts. diff --git a/backend/main.py b/backend/main.py index 9a7fec9..b0b67de 100644 --- a/backend/main.py +++ b/backend/main.py @@ -1,6 +1,6 @@ from fastapi import FastAPI from fastapi.middleware.cors import CORSMiddleware -from db import engine, Base +from db import engine, Base, get_db, ParsedRule from routers import coverage, ingest, settings, quality Base.metadata.create_all(bind=engine) @@ -15,6 +15,40 @@ with engine.connect() as _conn: app = FastAPI(title="SIEM Toolkit", version="1.0.0") + +@app.on_event("startup") +async def auto_load_detections(): + """ + Auto-load detection library rules on startup. + Tries the live S1 API first (accurate 'sources' field); falls back to extracted.json. + Skips if rules are already loaded — use the 'Sync Library' button to force a refresh. + """ + import os + from sqlalchemy.orm import Session + from services import s1_client + + db: Session = next(get_db()) + try: + existing = db.query(ParsedRule).filter_by(rule_type="library").count() + if existing > 0: + return # Already loaded — skip until user manually refreshes + + # Try live API first + try: + rules = await s1_client.get_platform_rules() + if rules: + coverage._import_from_api_rules(db, rules) + return + except Exception: + pass + + # Fall back to local file + detections_file = os.environ.get("DETECTIONS_FILE", "/app/data/detections.json") + if os.path.exists(detections_file): + coverage._import_detections(db, detections_file) + finally: + db.close() + app.add_middleware( CORSMiddleware, allow_origins=["http://localhost:3001"], diff --git a/backend/routers/coverage.py b/backend/routers/coverage.py index de4a93d..1e5516d 100644 --- a/backend/routers/coverage.py +++ b/backend/routers/coverage.py @@ -1,4 +1,5 @@ import json +import os from fastapi import APIRouter, UploadFile, File, Depends, HTTPException from pydantic import BaseModel from sqlalchemy.orm import Session @@ -6,6 +7,8 @@ from datetime import datetime from db import get_db, ParsedRule, ParserField, ActiveSource from services import s1_client, rule_parser +DETECTIONS_FILE = os.environ.get("DETECTIONS_FILE", "/app/data/detections.json") + router = APIRouter() @@ -40,22 +43,12 @@ def _star_query_texts(rule: dict) -> list[str]: @router.post("/load-star-rules") -async def load_star_rules(library_only: bool = None, db: Session = Depends(get_db)): - """Fetch STAR rules from SentinelOne and index their fields. - library_only defaults to the STAR_LIBRARY_ONLY env var (default true). - Pass ?library_only=false to include custom tenant rules as well. - """ - import os - if library_only is None: - library_only = os.environ.get("STAR_LIBRARY_ONLY", "true").lower() != "false" - +async def load_star_rules(db: Session = Depends(get_db)): + """Fetch all STAR rules from the Management Console API and index their fields.""" try: rules = await s1_client.get_star_rules() except Exception as e: - raise HTTPException(502, f"S1 API error: {e}") - - if library_only: - rules = [r for r in rules if str(r.get("creator", "")).lower().endswith("@sentinelone.com")] + raise HTTPException(502, f"S1 API error: {type(e).__name__}: {e}") # Replace all existing STAR rules cleanly to avoid duplicate key errors db.query(ParsedRule).filter_by(rule_type="star").delete() @@ -81,6 +74,118 @@ async def load_star_rules(library_only: bool = None, db: Session = Depends(get_d return {"loaded": len(loaded), "rules": loaded} +_EXCLUDED_PATHS = ("/rules/silent/", "/rules/dev/") + + +def _import_from_api_rules(db, rules: list) -> int: + """ + Import platform rules fetched directly from the S1 API into the database. + Each rule has a 'sources' list — the authoritative dataSource.name values. + """ + db.query(ParsedRule).filter_by(rule_type="library").delete() + db.commit() + + loaded = 0 + seen_ids: set = set() + for rule in rules: + rule_id = str(rule.get("id", f"lib_{loaded}")) + if rule_id in seen_ids: + continue + seen_ids.add(rule_id) + + sources = rule.get("sources") or [] + db.add(ParsedRule( + rule_id=rule_id, + name=rule.get("name", "unnamed"), + rule_type="library", + fields_used=[], # API rules don't expose field-level info + raw=json.dumps({"data_sources": sources}), + )) + loaded += 1 + if loaded % 500 == 0: + db.flush() + + db.commit() + return loaded + + +def _import_detections(db, detections_file: str) -> int: + """ + Import library detection rules from extracted.json into the database. + Replaces any existing library rules. Returns the count of rules loaded. + """ + with open(detections_file, "r", encoding="utf-8") as fh: + data = json.load(fh) + + results = data.get("results", []) + results = [r for r in results if not any(r.get("file", "").startswith(p) for p in _EXCLUDED_PATHS)] + + db.query(ParsedRule).filter_by(rule_type="library").delete() + db.commit() + + loaded = 0 + seen_ids: set = set() + for rule in results: + all_fields: set = set() + data_sources: list[str] = [] + for q in rule.get("queries", []): + all_fields.update(q.get("keys", [])) + ds_vals = q.get("pairs", {}).get("dataSource.name", []) + for v in ds_vals: + if isinstance(v, str): + data_sources.append(v) + elif isinstance(v, list): + data_sources.extend(str(x) for x in v) + + rule_id = str(rule.get("id", f"lib_{loaded}")) + if rule_id in seen_ids: + continue + seen_ids.add(rule_id) + + db.add(ParsedRule( + rule_id=rule_id, + name=rule.get("name", "unnamed"), + rule_type="library", + fields_used=list(all_fields), + raw=json.dumps({"data_sources": list(set(data_sources))}), + )) + loaded += 1 + if loaded % 500 == 0: + db.flush() + + db.commit() + return loaded + + +@router.post("/load-detections") +async def load_detections(db: Session = Depends(get_db)): + """ + Reload detection library rules. + Tries the live S1 API first (platform-rules endpoint); falls back to extracted.json. + """ + # Prefer the live API — gives accurate 'sources' and is always up to date + try: + rules = await s1_client.get_platform_rules() + if rules: + loaded = _import_from_api_rules(db, rules) + return {"loaded": loaded, "source": "api"} + except Exception: + pass + + # Fall back to local extracted.json + if not os.path.exists(DETECTIONS_FILE): + raise HTTPException( + 404, + "S1 API unavailable and no detections file found — " + "ensure the data/ volume is mounted with detections.json" + ) + try: + loaded = _import_detections(db, DETECTIONS_FILE) + except Exception as e: + raise HTTPException(500, f"Failed to import detections: {e}") + return {"loaded": loaded, "source": "file"} + + @router.post("/upload-sigma") async def upload_sigma(files: list[UploadFile] = File(...), db: Session = Depends(get_db)): """Upload one or more Sigma YAML files and index their fields.""" @@ -216,11 +321,21 @@ async def load_parser_content(payload: ParserContentPayload, db: Session = Depen return {"parser": payload.parser_name, "fields": list(fields), "field_count": len(fields)} +# Native SentinelOne platform sources — parsed by the system, not by SDL parsers. +# Excluded from the coverage map as they do not require custom parser coverage. +_S1_NATIVE_SOURCES = { + "SentinelOne", "asset", "alert", "vulnerability", + "ActivityFeed", "indicator", "misconfiguration", + "SentinelOne Ranger AD", +} + + @router.post("/sync-sources") async def sync_sources(days: int = 7, db: Session = Depends(get_db)): """Pull active dataSource.names from the SDL and store them. Also detects whether a parser is already producing structured fields for each source by checking if event.type is populated in the data lake. + Native S1 platform sources are excluded as they do not require SDL parsers. """ import asyncio from datetime import datetime, timedelta @@ -255,7 +370,7 @@ async def sync_sources(days: int = 7, db: Session = Depends(get_db)): seen = 0 for row in rows: name = row.get("dataSource.name") - if name: + if name and name not in _S1_NATIVE_SOURCES: db.add(ActiveSource( source_name=name, event_count=row.get("events", 0), @@ -264,7 +379,7 @@ async def sync_sources(days: int = 7, db: Session = Depends(get_db)): )) seen += 1 db.commit() - return {"synced": seen, "sources": [r["dataSource.name"] for r in rows if r.get("dataSource.name")]} + return {"synced": seen, "sources": [r["dataSource.name"] for r in rows if r.get("dataSource.name") and r["dataSource.name"] not in _S1_NATIVE_SOURCES]} def _build_parser_ds_index() -> dict[str, dict]: @@ -367,19 +482,28 @@ def get_coverage_map(db: Session = Depends(get_db)): # Build rule index: source_name → rules that reference it rule_by_source: dict[str, list] = {} for rule in rules: - query_texts = _star_query_texts(json.loads(rule.raw)) if rule.rule_type == "star" else [] - data_sources = rule_parser.extract_data_sources(query_texts) + try: + raw_data = json.loads(rule.raw) if rule.raw else {} + except Exception: + raw_data = {} + + if rule.rule_type == "library": + # Library rules store pre-extracted data_sources list in raw + data_sources = raw_data.get("data_sources", []) + else: + query_texts = _star_query_texts(raw_data) + data_sources = rule_parser.extract_data_sources(query_texts) + for ds in data_sources: rule_by_source.setdefault(ds, []).append({"rule": rule.name, "type": rule.rule_type}) - if not data_sources: - # Rule with no explicit source filter — applies to all - rule_by_source.setdefault("__any__", []).append({"rule": rule.name, "type": rule.rule_type}) # Fields to ignore when computing "missing" — these are metadata/schema fields # always present in events regardless of the parser _SCHEMA_FIELDS = { "dataSource.name", "dataSource.vendor", "dataSource.category", "event.type", "timestamp", "src.endpoint.ip", "src.endpoint.name", + # Endpoint agent fields — populated by the SentinelOne agent, not by SDL parsers + "cmdScript.content", "endpoint.os", "endpoint.name", "endpoint.uid", } sources_out = [] @@ -414,22 +538,75 @@ def get_coverage_map(db: Session = Depends(get_db)): else: needed_count += 1 - rules_for_src = rule_by_source.get(src.source_name, []) + rule_by_source.get("__any__", []) + rules_for_src: list = [r for r in rule_by_source.get(src.source_name, []) if r["type"] == "library"] - # Fields all associated rules need, minus schema fields always present - rule_fields_needed: set = set() + # Close-match suggestions — shown when there are no library rules for this source. + close_matches: list = [] + if not rules_for_src: + import re as _re + + def _word_tokens(s: str) -> set: + """Split on non-alphanumeric boundaries, lowercase, drop single chars.""" + return {t for t in _re.split(r"[^a-z0-9]+", s.lower()) if len(t) >= 2} + + def _is_close(a: str, b: str) -> bool: + na, nb = _normalize(a), _normalize(b) + # 1. Simple substring match + if na in nb or nb in na: + return True + # 2. Token-level: handles "Microsoft 365 Collaboration" vs "Microsoft O365" + # — "365" is inside "o365", and they share "microsoft" + ta, tb = _word_tokens(a), _word_tokens(b) + shared_exact = ta & tb + if not shared_exact: + return False # Must share at least one word exactly + # Check that a DISTINCTIVE (non-shared) token from one name + # appears as a substring inside a token from the other. + # This avoids matching "Azure AD" to "Azure Platform" on "azure" alone. + unique_a = ta - shared_exact + unique_b = tb - shared_exact + return any( + ua in ub or ub in ua + for ua in unique_a for ub in unique_b + if len(ua) >= 2 and len(ub) >= 2 + ) + + sn = _normalize(src.source_name) + for lib_ds, lib_rules in rule_by_source.items(): + lib_only = [r for r in lib_rules if r["type"] == "library"] + if not lib_only: + continue + if _is_close(src.source_name, lib_ds): + close_matches.append({ + "library_name": lib_ds, + "rule_count": len(lib_only), + }) + close_matches.sort(key=lambda x: x["rule_count"], reverse=True) + close_matches = close_matches[:3] + + # Count how many rules reference each field (frequency) + field_freq: dict[str, int] = {} for r in rules_for_src: - rule_fields_needed |= rule_fields_index.get(r["rule"], set()) - rule_fields_needed -= _SCHEMA_FIELDS + for f in rule_fields_index.get(r["rule"], set()): + field_freq[f] = field_freq.get(f, 0) + 1 # Fields the parser provides parser_provides = parser_index.get(matched_parser, set()) if matched_parser and matched_parser != "detected in data" else set() - # Missing = fields rules need that the parser doesn't provide. - # Only consider dotted-path fields (e.g. src.ip, winEventLog.channel) — - # single-word tokens are typically correlation variables or rule metadata. - rule_fields_dotted = {f for f in rule_fields_needed if "." in f} - missing_fields = sorted(rule_fields_dotted - parser_provides) + # Minimum number of rules that must reference a field before we flag it. + # Scales with rule count so single-rule oddities don't dominate. + rule_count = len(rules_for_src) + min_rules = max(2, round(rule_count * 0.05)) if rule_count >= 10 else 2 + + # Missing = dotted-path fields needed by >= min_rules rules, + # not in schema constants, not provided by the parser. + missing_fields = sorted( + f for f, count in field_freq.items() + if count >= min_rules + and "." in f + and f not in _SCHEMA_FIELDS + and f not in parser_provides + ) sources_out.append({ "source_name": src.source_name, @@ -441,6 +618,7 @@ def get_coverage_map(db: Session = Depends(get_db)): "parser_detected": src.parser_detected or 0, "rules": rules_for_src, "rule_count": len(rules_for_src), + "close_matches": close_matches, "missing_fields": missing_fields, "missing_fields_count": len(missing_fields), "synced_at": src.synced_at.isoformat() if src.synced_at else None, diff --git a/backend/routers/settings.py b/backend/routers/settings.py index 4304398..9eddca0 100644 --- a/backend/routers/settings.py +++ b/backend/routers/settings.py @@ -15,9 +15,6 @@ FIELDS = [ {"key": "SDL_XDR_URL", "label": "SDL XDR URL", "secret": False, "placeholder": "https://xdr.us1.sentinelone.net"}, {"key": "SDL_LOG_READ_KEY", "label": "SDL Log Read Key", "secret": True, "placeholder": "1DnK0Y4e..."}, {"key": "ANTHROPIC_API_KEY", "label": "Anthropic API Key", "secret": True, "placeholder": "sk-ant-..."}, - {"key": "STAR_LIBRARY_ONLY", "label": "STAR Rules — Library Only", "secret": False, "placeholder": "true", - "type": "select", "options": ["true", "false"], - "hint": "true = load only SentinelOne Library rules (@sentinelone.com creators). false = include custom tenant rules as well."}, ] FIELD_KEYS = {f["key"] for f in FIELDS} diff --git a/backend/services/s1_client.py b/backend/services/s1_client.py index 66f39f5..9c23e2b 100644 --- a/backend/services/s1_client.py +++ b/backend/services/s1_client.py @@ -24,16 +24,72 @@ def _iso_to_epoch_ms(iso_str: str) -> int: return int(dt.timestamp() * 1000) -async def get_star_rules(limit: int = 200) -> list: - """Fetch active STAR rules from the Management Console API.""" +async def get_star_rules(page_size: int = 100) -> list: + """Fetch custom STAR rules from /cloud-detection/rules, paginating via cursor.""" + all_rules = [] + cursor = None async with httpx.AsyncClient(timeout=30) as client: - resp = await client.get( - f"{BASE_URL}/web/api/v2.1/cloud-detection/rules", - headers=HEADERS, - params={"limit": limit}, - ) - resp.raise_for_status() - return resp.json().get("data", []) + while True: + params = {"limit": page_size} + if cursor: + params["cursor"] = cursor + resp = await client.get( + f"{BASE_URL}/web/api/v2.1/cloud-detection/rules", + headers=HEADERS, + params=params, + ) + resp.raise_for_status() + body = resp.json() + all_rules.extend(body.get("data", [])) + cursor = body.get("pagination", {}).get("nextCursor") + if not cursor: + break + return all_rules + + +async def get_library_rules(page_size: int = 100) -> list: + """ + Fetch Detection Library (OOTB/Platform) rules from /web/api/v2.1/detection-library/rules. + Requires an account-level or higher API token — site-scoped tokens will receive a 400. + Returns an empty list gracefully if the token lacks sufficient scope. + """ + all_rules = [] + cursor = None + async with httpx.AsyncClient(timeout=60) as client: + while True: + params: dict = {"limit": page_size} + if cursor: + params["cursor"] = cursor + resp = await client.get( + f"{BASE_URL}/web/api/v2.1/detection-library/rules", + headers=HEADERS, + params=params, + ) + # 400 typically means site-scoped token — return empty rather than crash + if resp.status_code == 400: + return [] + resp.raise_for_status() + body = resp.json() + batch = body.get("data", []) + all_rules.extend(batch) + cursor = body.get("pagination", {}).get("nextCursor") + if not cursor: + break + + results = [] + for rule in all_rules: + results.append({ + "id": str(rule.get("id", "")), + "name": rule.get("name", "unnamed"), + "s1ql": rule.get("s1ql") or rule.get("query", ""), + "queryType": rule.get("queryType", "events"), + "severity": rule.get("severity", ""), + "description": rule.get("description", ""), + "gdlRuleId": rule.get("id", ""), + "creator": "SentinelOne", + "expirationMode": rule.get("expirationMode", "Permanent"), + }) + return results async def run_powerquery(query: str, from_date: str, to_date: str) -> dict: @@ -124,6 +180,55 @@ async def get_sdl_parser(filename: str) -> dict: return resp.json() +async def get_account_id() -> str | None: + """Return the first account ID visible to the current token.""" + async with httpx.AsyncClient(timeout=15) as client: + resp = await client.get( + f"{BASE_URL}/web/api/v2.1/accounts", + headers=HEADERS, + params={"limit": 1}, + ) + resp.raise_for_status() + accounts = resp.json().get("data", []) + return str(accounts[0]["id"]) if accounts else None + + +async def get_platform_rules(page_size: int = 1000) -> list: + """ + Fetch all Detection Library platform rules from /detection-library/platform-rules. + Requires scopeLevel + scopeId — uses account scope with the first visible account. + Returns list of rules, each with a 'sources' list (authoritative data source names). + """ + account_id = await get_account_id() + if not account_id: + return [] + + all_rules: list = [] + cursor: str = "" + async with httpx.AsyncClient(timeout=60) as client: + while True: + params: dict = { + "scopeLevel": "account", + "scopeId": account_id, + "limit": page_size, + "cursor": cursor, + } + resp = await client.get( + f"{BASE_URL}/web/api/v2.1/detection-library/platform-rules", + headers=HEADERS, + params=params, + ) + if resp.status_code == 400: + return [] + resp.raise_for_status() + body = resp.json() + all_rules.extend(body.get("data", [])) + cursor = body.get("pagination", {}).get("nextCursor") or "" + if not cursor: + break + return all_rules + + async def get_sites() -> list: async with httpx.AsyncClient(timeout=30) as client: resp = await client.get( diff --git a/docker-compose.yml b/docker-compose.yml index 7b13dcf..ebcbfc2 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -17,12 +17,14 @@ services: - SDL_LOG_READ_KEY=${SDL_LOG_READ_KEY} - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY} - DATABASE_URL=postgresql://siem:siem@db:5432/siem + - DETECTIONS_FILE=/app/data/detections.json depends_on: db: condition: service_healthy volumes: - ./parsers:/app/parsers - ./.env:/app/.env + - ./data:/app/data:ro db: image: postgres:16-alpine diff --git a/frontend/index.html b/frontend/index.html index 6a4723e..95e1f86 100644 --- a/frontend/index.html +++ b/frontend/index.html @@ -116,17 +116,98 @@ function barChart(rows, labelKey, valueKey) { function renderHome() { set(`
SentinelOne AI-SIEM · demo.sentinelone.net
Highest-volume sources with no parser running — click to inspect in Parser Quality.
+| Source | +Volume | +
|---|
No active sources synced yet.
-Click Sync Live Sources to pull current dataSource.names from the data lake, then Load STAR Rules and Load SDL Parsers to see coverage.
+Click Sync Live Sources to pull current dataSource.names from the data lake, then Load SDL Parsers to see coverage.