Auto-load detection library from S1 API, improve coverage map accuracy

- Fetch detection library rules from platform-rules API at startup (falls back to extracted.json); adds Sync Detection Library button for refresh - Parser column simplified to ✓ Parsed / ✗ Not Parsed - Detection counts now use library rules only (exclude custom STAR rules) - Add close-match suggestions for dataSource.name mismatches (e.g. CloudTrail → AWS CloudTrail, Microsoft 365 Collaboration → Microsoft O365) - Exclude SentinelOne Ranger AD from coverage map (native S1 source) - Add success feedback banners to Load SDL Parsers and Sync Library buttons - Remove rule_counts.json manual override; extracted.json is source of truth - Remove Load Detections button; rules auto-import on backend startup - Add get_account_id() and get_platform_rules() to s1_client Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 20:37:12 +00:00 · 2026-05-20 15:14:10 -04:00
parent 6e137438b1
commit 6cd9da82da
8 changed files with 580 additions and 90 deletions
@@ -7,3 +7,4 @@ node_modules/
 frontend/out/
 pgdata/
 parsers/*.json
 data/
@@ -10,6 +10,7 @@ A self-hosted troubleshooting and visibility tool for SentinelOne AI-SIEM SecOps
 | Page | Purpose |
 |---|---|
 | **Overview** | Live health stats — coverage percentage, active sources, top uncovered sources by volume |
 | **Parser Coverage Map** | Which active data sources have a parser? Which don't? |
 | **Ingest Dashboard** | Event volume, top sources, cost projection, filter simulator |
 | **Parser Quality** | Live event sampler, field population rate, parser test runner |
@@ -26,12 +27,12 @@ browser → nginx (port 3001) → single-page HTML/JS application
          FastAPI backend (port 8001)
                ↓
    ┌───────────────────────────┐
-    │  PostgreSQL (SQLAlchemy)  │  parsed rules, parser fields, active sources
+    │  PostgreSQL (SQLAlchemy)  │  parser fields, active sources
    └───────────────────────────┘
                ↓
    ┌───────────────────────────┐
    │  SentinelOne APIs         │
-    │  • Management API (STAR)  │  demo.sentinelone.net
+    │  • Management API         │  demo.sentinelone.net
    │  • Scalyr XDR PowerQuery  │  xdr.us1.sentinelone.net
    └───────────────────────────┘
 ```
@@ -54,16 +55,34 @@ Edit `.env` with your credentials:
 ```env
 S1_BASE_URL=https://demo.sentinelone.net       # Your console URL
-S1_API_TOKEN=eyJ...                             # Service user API token
+S1_API_TOKEN=eyJ...                             # Service user API token (account scope or higher)
 SDL_XDR_URL=https://xdr.us1.sentinelone.net    # Scalyr XDR endpoint
 SDL_LOG_READ_KEY=1j2IU0S...                     # Data Lake read key
-ANTHROPIC_API_KEY=                              # Optional — Onboarding page only
+ANTHROPIC_API_KEY=                              # Optional — not currently used
 ```
-**S1_API_TOKEN** — generate at *Settings → Users → Service Users* in the console.  
+**S1_API_TOKEN** — generate at *Settings → Users → Service Users* in the console. The service user should be provisioned at **account scope** or higher.  
 **SDL_LOG_READ_KEY** — found at *Settings → Integrations → Data Lake API Keys*.
-### 2. Add Parser Files (optional but strongly recommended)
+### 2. Add the Detection Library (strongly recommended)
 The Detection Fields Missing column and per-source detection counts on the Coverage Map require a local detections export. This is generated from the [detection-validator](https://github.com/mickbrowns1/detection-validator) repository.
 ```bash
 # Clone the detection-validator repo alongside this one
 git clone https://github.com/mickbrowns1/detection-validator.git
 cd detection-validator
 # Follow its README to generate the export, then copy the output here:
 mkdir -p ../SIEM-Toolkit/data
 cp data/data/detections/extracted.json ../SIEM-Toolkit/data/detections.json
 cd ../SIEM-Toolkit
 ```
 The `data/` directory is gitignored and never committed. Once the stack is running, click **Load Detections** on the Coverage Map to import the rules into the database.
 ### 3. Add Parser Files (optional but strongly recommended)
 Place your SDL parser JSON files into the `parsers/` directory. The backend reads them directly at query time — no rebuild is necessary.
@@ -71,7 +90,7 @@ Place your SDL parser JSON files into the `parsers/` directory. The backend read
 cp ~/my-parsers/*.json parsers/
 ```
-### 3. Start the Stack
+### 4. Start the Stack
 ```bash
 docker-compose up -d --build
@@ -83,6 +102,18 @@ Open **http://localhost:3001** in your browser and you're off.
 ## Features
 ### Overview Dashboard
 The landing page gives you an at-a-glance health summary drawn live from the database:
 - **Parser Coverage %** — proportion of active sources with a confirmed parser
 - **Active Sources** — total number of `dataSource.name` values seen in the last 7 days
 - **Covered / Need Parser** — counts for each status
 If any sources are uncovered, the **Top Sources Needing a Parser** table lists the highest-volume offenders. Click any source name to jump directly to the Parser Quality page with that source pre-selected.
 ---
 ### Parser Coverage Map
 Answers the question: *does each active data source have a parser running?*
@@ -91,7 +122,6 @@ Answers the question: *does each active data source have a parser running?*
 1. **Sync Live Sources** — executes a PowerQuery against your data lake to retrieve every `dataSource.name` seen in the last 7 days, along with event counts.
 2. **Load SDL Parsers** — reads parser files from `parsers/`, extracts the `dataSource.name` attribute from each, and stores the field list in the database.
 3. **Load STAR Rules** — retrieves your STAR detection rules from the management API and indexes which data sources each rule references.
 **Matching logic (three-tier):**
 1. Exact `dataSource.name` match between the active source and the parser attribute
@@ -104,6 +134,10 @@ Answers the question: *does each active data source have a parser running?*
 - 🟢 **Covered** — custom parser confirmed (local file or detected via parsed events in the data lake)
 - 🔴 **Parser Needed** — no parser found, or only a grok/dottedJson format (which typically indicates an incomplete parser)
 **Filters:** Use the filter pills to focus on Custom Parser only, Default Parser Only (data lake detected), or No Parser.
 **Deep link:** Click any source name in the table to open it directly in Parser Quality with all dropdowns pre-populated.
 **Expected results:** After syncing sources and loading parsers, sources with active SDL parsers will appear as Covered. Sources sending raw, unparsed data — where only `message` and `timestamp` appear in the data lake — will appear as Parser Needed.
 ---
@@ -173,8 +207,7 @@ A prompt template for using Claude Code to onboard a new log source. Copy the te
 - An SDL parser skeleton in augmented-JSON format
 - Field mappings to the SDL common schema
- 2–3 starter STAR detection rules
+- Parser test assertions
 - 5 parser test assertions
 No Anthropic API key is required — this uses Claude Code directly from your terminal.
@@ -222,7 +255,7 @@ curl -X DELETE http://localhost:8001/api/coverage/reset
 │   │   └── settings.py          # .env read/write
 │   └── services/
 │       ├── s1_client.py         # SentinelOne + Scalyr API client
-│       └── rule_parser.py       # SDL/Sigma/STAR field extraction
+│       └── rule_parser.py       # SDL format string field extraction
 ├── frontend/
 │   └── index.html               # Single-page application (Tailwind, vanilla JS)
 ├── parsers/                     # SDL parser files (volume-mounted)
@@ -240,3 +273,4 @@ curl -X DELETE http://localhost:8001/api/coverage/reset
 - The backend queries your **demo tenant** (`demo.sentinelone.net`) — not usea1-purple or any other tenant. Ensure your `S1_BASE_URL` and `SDL_LOG_READ_KEY` are pointed at the same tenant.
 - Parser files in `parsers/` are read at query time, not on startup — add or update files at any point without rebuilding the image.
 - The filter simulator is entirely read-only and makes no changes whatsoever to your tenant configuration.
 - The service user API token must be at **account scope** or higher. Site-scoped tokens will have limited visibility into rules and may see reduced source counts.
@@ -1,6 +1,6 @@
 from fastapi import FastAPI
 from fastapi.middleware.cors import CORSMiddleware
-from db import engine, Base
+from db import engine, Base, get_db, ParsedRule
 from routers import coverage, ingest, settings, quality
 Base.metadata.create_all(bind=engine)
@@ -15,6 +15,40 @@ with engine.connect() as _conn:
 app = FastAPI(title="SIEM Toolkit", version="1.0.0")
@app.on_event("startup")
 async def auto_load_detections():
    """
    Auto-load detection library rules on startup.
    Tries the live S1 API first (accurate 'sources' field); falls back to extracted.json.
    Skips if rules are already loaded — use the 'Sync Library' button to force a refresh.
    """
    import os
    from sqlalchemy.orm import Session
    from services import s1_client
    db: Session = next(get_db())
    try:
        existing = db.query(ParsedRule).filter_by(rule_type="library").count()
        if existing > 0:
            return  # Already loaded — skip until user manually refreshes
        # Try live API first
        try:
            rules = await s1_client.get_platform_rules()
            if rules:
                coverage._import_from_api_rules(db, rules)
                return
        except Exception:
            pass
        # Fall back to local file
        detections_file = os.environ.get("DETECTIONS_FILE", "/app/data/detections.json")
        if os.path.exists(detections_file):
            coverage._import_detections(db, detections_file)
    finally:
        db.close()
 app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:3001"],
@@ -1,4 +1,5 @@
 import json
 import os
 from fastapi import APIRouter, UploadFile, File, Depends, HTTPException
 from pydantic import BaseModel
 from sqlalchemy.orm import Session
@@ -6,6 +7,8 @@ from datetime import datetime
 from db import get_db, ParsedRule, ParserField, ActiveSource
 from services import s1_client, rule_parser
 DETECTIONS_FILE = os.environ.get("DETECTIONS_FILE", "/app/data/detections.json")
 router = APIRouter()
@@ -40,22 +43,12 @@ def _star_query_texts(rule: dict) -> list[str]:
@router.post("/load-star-rules")
-async def load_star_rules(library_only: bool = None, db: Session = Depends(get_db)):
+async def load_star_rules(db: Session = Depends(get_db)):
-    """Fetch STAR rules from SentinelOne and index their fields.
+    """Fetch all STAR rules from the Management Console API and index their fields."""
    library_only defaults to the STAR_LIBRARY_ONLY env var (default true).
    Pass ?library_only=false to include custom tenant rules as well.
    """
    import os
    if library_only is None:
        library_only = os.environ.get("STAR_LIBRARY_ONLY", "true").lower() != "false"
    try:
        rules = await s1_client.get_star_rules()
    except Exception as e:
-        raise HTTPException(502, f"S1 API error: {e}")
+        raise HTTPException(502, f"S1 API error: {type(e).__name__}: {e}")
    if library_only:
        rules = [r for r in rules if str(r.get("creator", "")).lower().endswith("@sentinelone.com")]
    # Replace all existing STAR rules cleanly to avoid duplicate key errors
    db.query(ParsedRule).filter_by(rule_type="star").delete()
@@ -81,6 +74,118 @@ async def load_star_rules(library_only: bool = None, db: Session = Depends(get_d
    return {"loaded": len(loaded), "rules": loaded}
 _EXCLUDED_PATHS = ("/rules/silent/", "/rules/dev/")
 def _import_from_api_rules(db, rules: list) -> int:
    """
    Import platform rules fetched directly from the S1 API into the database.
    Each rule has a 'sources' list — the authoritative dataSource.name values.
    """
    db.query(ParsedRule).filter_by(rule_type="library").delete()
    db.commit()
    loaded = 0
    seen_ids: set = set()
    for rule in rules:
        rule_id = str(rule.get("id", f"lib_{loaded}"))
        if rule_id in seen_ids:
            continue
        seen_ids.add(rule_id)
        sources = rule.get("sources") or []
        db.add(ParsedRule(
            rule_id=rule_id,
            name=rule.get("name", "unnamed"),
            rule_type="library",
            fields_used=[],          # API rules don't expose field-level info
            raw=json.dumps({"data_sources": sources}),
        ))
        loaded += 1
        if loaded % 500 == 0:
            db.flush()
    db.commit()
    return loaded
 def _import_detections(db, detections_file: str) -> int:
    """
    Import library detection rules from extracted.json into the database.
    Replaces any existing library rules. Returns the count of rules loaded.
    """
    with open(detections_file, "r", encoding="utf-8") as fh:
        data = json.load(fh)
    results = data.get("results", [])
    results = [r for r in results if not any(r.get("file", "").startswith(p) for p in _EXCLUDED_PATHS)]
    db.query(ParsedRule).filter_by(rule_type="library").delete()
    db.commit()
    loaded = 0
    seen_ids: set = set()
    for rule in results:
        all_fields: set = set()
        data_sources: list[str] = []
        for q in rule.get("queries", []):
            all_fields.update(q.get("keys", []))
            ds_vals = q.get("pairs", {}).get("dataSource.name", [])
            for v in ds_vals:
                if isinstance(v, str):
                    data_sources.append(v)
                elif isinstance(v, list):
                    data_sources.extend(str(x) for x in v)
        rule_id = str(rule.get("id", f"lib_{loaded}"))
        if rule_id in seen_ids:
            continue
        seen_ids.add(rule_id)
        db.add(ParsedRule(
            rule_id=rule_id,
            name=rule.get("name", "unnamed"),
            rule_type="library",
            fields_used=list(all_fields),
            raw=json.dumps({"data_sources": list(set(data_sources))}),
        ))
        loaded += 1
        if loaded % 500 == 0:
            db.flush()
    db.commit()
    return loaded
@router.post("/load-detections")
 async def load_detections(db: Session = Depends(get_db)):
    """
    Reload detection library rules.
    Tries the live S1 API first (platform-rules endpoint); falls back to extracted.json.
    """
    # Prefer the live API — gives accurate 'sources' and is always up to date
    try:
        rules = await s1_client.get_platform_rules()
        if rules:
            loaded = _import_from_api_rules(db, rules)
            return {"loaded": loaded, "source": "api"}
    except Exception:
        pass
    # Fall back to local extracted.json
    if not os.path.exists(DETECTIONS_FILE):
        raise HTTPException(
            404,
            "S1 API unavailable and no detections file found — "
            "ensure the data/ volume is mounted with detections.json"
        )
    try:
        loaded = _import_detections(db, DETECTIONS_FILE)
    except Exception as e:
        raise HTTPException(500, f"Failed to import detections: {e}")
    return {"loaded": loaded, "source": "file"}
@router.post("/upload-sigma")
 async def upload_sigma(files: list[UploadFile] = File(...), db: Session = Depends(get_db)):
    """Upload one or more Sigma YAML files and index their fields."""
@@ -216,11 +321,21 @@ async def load_parser_content(payload: ParserContentPayload, db: Session = Depen
    return {"parser": payload.parser_name, "fields": list(fields), "field_count": len(fields)}
 # Native SentinelOne platform sources — parsed by the system, not by SDL parsers.
 # Excluded from the coverage map as they do not require custom parser coverage.
 _S1_NATIVE_SOURCES = {
    "SentinelOne", "asset", "alert", "vulnerability",
    "ActivityFeed", "indicator", "misconfiguration",
    "SentinelOne Ranger AD",
 }
@router.post("/sync-sources")
 async def sync_sources(days: int = 7, db: Session = Depends(get_db)):
    """Pull active dataSource.names from the SDL and store them.
    Also detects whether a parser is already producing structured fields
    for each source by checking if event.type is populated in the data lake.
    Native S1 platform sources are excluded as they do not require SDL parsers.
    """
    import asyncio
    from datetime import datetime, timedelta
@@ -255,7 +370,7 @@ async def sync_sources(days: int = 7, db: Session = Depends(get_db)):
    seen = 0
    for row in rows:
        name = row.get("dataSource.name")
-        if name:
+        if name and name not in _S1_NATIVE_SOURCES:
            db.add(ActiveSource(
                source_name=name,
                event_count=row.get("events", 0),
@@ -264,7 +379,7 @@ async def sync_sources(days: int = 7, db: Session = Depends(get_db)):
            ))
            seen += 1
    db.commit()
-    return {"synced": seen, "sources": [r["dataSource.name"] for r in rows if r.get("dataSource.name")]}
+    return {"synced": seen, "sources": [r["dataSource.name"] for r in rows if r.get("dataSource.name") and r["dataSource.name"] not in _S1_NATIVE_SOURCES]}
 def _build_parser_ds_index() -> dict[str, dict]:
@@ -367,19 +482,28 @@ def get_coverage_map(db: Session = Depends(get_db)):
    # Build rule index: source_name → rules that reference it
    rule_by_source: dict[str, list] = {}
    for rule in rules:
-        query_texts = _star_query_texts(json.loads(rule.raw)) if rule.rule_type == "star" else []
+        try:
-        data_sources = rule_parser.extract_data_sources(query_texts)
+            raw_data = json.loads(rule.raw) if rule.raw else {}
        except Exception:
            raw_data = {}
        if rule.rule_type == "library":
            # Library rules store pre-extracted data_sources list in raw
            data_sources = raw_data.get("data_sources", [])
        else:
            query_texts = _star_query_texts(raw_data)
            data_sources = rule_parser.extract_data_sources(query_texts)
        for ds in data_sources:
            rule_by_source.setdefault(ds, []).append({"rule": rule.name, "type": rule.rule_type})
        if not data_sources:
            # Rule with no explicit source filter — applies to all
            rule_by_source.setdefault("__any__", []).append({"rule": rule.name, "type": rule.rule_type})
    # Fields to ignore when computing "missing" — these are metadata/schema fields
    # always present in events regardless of the parser
    _SCHEMA_FIELDS = {
        "dataSource.name", "dataSource.vendor", "dataSource.category",
        "event.type", "timestamp", "src.endpoint.ip", "src.endpoint.name",
        # Endpoint agent fields — populated by the SentinelOne agent, not by SDL parsers
        "cmdScript.content", "endpoint.os", "endpoint.name", "endpoint.uid",
    }
    sources_out = []
@@ -414,22 +538,75 @@ def get_coverage_map(db: Session = Depends(get_db)):
        else:
            needed_count += 1
-        rules_for_src = rule_by_source.get(src.source_name, []) + rule_by_source.get("__any__", [])
+        rules_for_src: list = [r for r in rule_by_source.get(src.source_name, []) if r["type"] == "library"]
-        # Fields all associated rules need, minus schema fields always present
+        # Close-match suggestions — shown when there are no library rules for this source.
-        rule_fields_needed: set = set()
+        close_matches: list = []
        if not rules_for_src:
            import re as _re
            def _word_tokens(s: str) -> set:
                """Split on non-alphanumeric boundaries, lowercase, drop single chars."""
                return {t for t in _re.split(r"[^a-z0-9]+", s.lower()) if len(t) >= 2}
            def _is_close(a: str, b: str) -> bool:
                na, nb = _normalize(a), _normalize(b)
                # 1. Simple substring match
                if na in nb or nb in na:
                    return True
                # 2. Token-level: handles "Microsoft 365 Collaboration" vs "Microsoft O365"
                #    — "365" is inside "o365", and they share "microsoft"
                ta, tb = _word_tokens(a), _word_tokens(b)
                shared_exact = ta & tb
                if not shared_exact:
                    return False  # Must share at least one word exactly
                # Check that a DISTINCTIVE (non-shared) token from one name
                # appears as a substring inside a token from the other.
                # This avoids matching "Azure AD" to "Azure Platform" on "azure" alone.
                unique_a = ta - shared_exact
                unique_b = tb - shared_exact
                return any(
                    ua in ub or ub in ua
                    for ua in unique_a for ub in unique_b
                    if len(ua) >= 2 and len(ub) >= 2
                )
            sn = _normalize(src.source_name)
            for lib_ds, lib_rules in rule_by_source.items():
                lib_only = [r for r in lib_rules if r["type"] == "library"]
                if not lib_only:
                    continue
                if _is_close(src.source_name, lib_ds):
                    close_matches.append({
                        "library_name": lib_ds,
                        "rule_count": len(lib_only),
                    })
            close_matches.sort(key=lambda x: x["rule_count"], reverse=True)
            close_matches = close_matches[:3]
        # Count how many rules reference each field (frequency)
        field_freq: dict[str, int] = {}
        for r in rules_for_src:
-            rule_fields_needed |= rule_fields_index.get(r["rule"], set())
+            for f in rule_fields_index.get(r["rule"], set()):
-        rule_fields_needed -= _SCHEMA_FIELDS
+                field_freq[f] = field_freq.get(f, 0) + 1
        # Fields the parser provides
        parser_provides = parser_index.get(matched_parser, set()) if matched_parser and matched_parser != "detected in data" else set()
-        # Missing = fields rules need that the parser doesn't provide.
+        # Minimum number of rules that must reference a field before we flag it.
-        # Only consider dotted-path fields (e.g. src.ip, winEventLog.channel) —
+        # Scales with rule count so single-rule oddities don't dominate.
-        # single-word tokens are typically correlation variables or rule metadata.
+        rule_count = len(rules_for_src)
-        rule_fields_dotted = {f for f in rule_fields_needed if "." in f}
+        min_rules = max(2, round(rule_count * 0.05)) if rule_count >= 10 else 2
-        missing_fields = sorted(rule_fields_dotted - parser_provides)
+
        # Missing = dotted-path fields needed by >= min_rules rules,
        # not in schema constants, not provided by the parser.
        missing_fields = sorted(
            f for f, count in field_freq.items()
            if count >= min_rules
            and "." in f
            and f not in _SCHEMA_FIELDS
            and f not in parser_provides
        )
        sources_out.append({
            "source_name": src.source_name,
@@ -441,6 +618,7 @@ def get_coverage_map(db: Session = Depends(get_db)):
            "parser_detected": src.parser_detected or 0,
            "rules": rules_for_src,
            "rule_count": len(rules_for_src),
            "close_matches": close_matches,
            "missing_fields": missing_fields,
            "missing_fields_count": len(missing_fields),
            "synced_at": src.synced_at.isoformat() if src.synced_at else None,
@@ -15,9 +15,6 @@ FIELDS = [
    {"key": "SDL_XDR_URL",          "label": "SDL XDR URL",                   "secret": False, "placeholder": "https://xdr.us1.sentinelone.net"},
    {"key": "SDL_LOG_READ_KEY",     "label": "SDL Log Read Key",              "secret": True,  "placeholder": "1DnK0Y4e..."},
    {"key": "ANTHROPIC_API_KEY",    "label": "Anthropic API Key",             "secret": True,  "placeholder": "sk-ant-..."},
    {"key": "STAR_LIBRARY_ONLY",    "label": "STAR Rules — Library Only",     "secret": False, "placeholder": "true",
     "type": "select", "options": ["true", "false"],
     "hint": "true = load only SentinelOne Library rules (@sentinelone.com creators). false = include custom tenant rules as well."},
 ]
 FIELD_KEYS = {f["key"] for f in FIELDS}
@@ -24,16 +24,72 @@ def _iso_to_epoch_ms(iso_str: str) -> int:
    return int(dt.timestamp() * 1000)
-async def get_star_rules(limit: int = 200) -> list:
+async def get_star_rules(page_size: int = 100) -> list:
-    """Fetch active STAR rules from the Management Console API."""
+    """Fetch custom STAR rules from /cloud-detection/rules, paginating via cursor."""
    all_rules = []
    cursor = None
    async with httpx.AsyncClient(timeout=30) as client:
-        resp = await client.get(
+        while True:
-            f"{BASE_URL}/web/api/v2.1/cloud-detection/rules",
+            params = {"limit": page_size}
-            headers=HEADERS,
+            if cursor:
-            params={"limit": limit},
+                params["cursor"] = cursor
-        )
+            resp = await client.get(
-        resp.raise_for_status()
+                f"{BASE_URL}/web/api/v2.1/cloud-detection/rules",
-        return resp.json().get("data", [])
+                headers=HEADERS,
                params=params,
            )
            resp.raise_for_status()
            body = resp.json()
            all_rules.extend(body.get("data", []))
            cursor = body.get("pagination", {}).get("nextCursor")
            if not cursor:
                break
    return all_rules
 async def get_library_rules(page_size: int = 100) -> list:
    """
    Fetch Detection Library (OOTB/Platform) rules from /web/api/v2.1/detection-library/rules.
    Requires an account-level or higher API token — site-scoped tokens will receive a 400.
    Returns an empty list gracefully if the token lacks sufficient scope.
    """
    all_rules = []
    cursor = None
    async with httpx.AsyncClient(timeout=60) as client:
        while True:
            params: dict = {"limit": page_size}
            if cursor:
                params["cursor"] = cursor
            resp = await client.get(
                f"{BASE_URL}/web/api/v2.1/detection-library/rules",
                headers=HEADERS,
                params=params,
            )
            # 400 typically means site-scoped token — return empty rather than crash
            if resp.status_code == 400:
                return []
            resp.raise_for_status()
            body = resp.json()
            batch = body.get("data", [])
            all_rules.extend(batch)
            cursor = body.get("pagination", {}).get("nextCursor")
            if not cursor:
                break
    results = []
    for rule in all_rules:
        results.append({
            "id": str(rule.get("id", "")),
            "name": rule.get("name", "unnamed"),
            "s1ql": rule.get("s1ql") or rule.get("query", ""),
            "queryType": rule.get("queryType", "events"),
            "severity": rule.get("severity", ""),
            "description": rule.get("description", ""),
            "gdlRuleId": rule.get("id", ""),
            "creator": "SentinelOne",
            "expirationMode": rule.get("expirationMode", "Permanent"),
        })
    return results
 async def run_powerquery(query: str, from_date: str, to_date: str) -> dict:
@@ -124,6 +180,55 @@ async def get_sdl_parser(filename: str) -> dict:
        return resp.json()
 async def get_account_id() -> str | None:
    """Return the first account ID visible to the current token."""
    async with httpx.AsyncClient(timeout=15) as client:
        resp = await client.get(
            f"{BASE_URL}/web/api/v2.1/accounts",
            headers=HEADERS,
            params={"limit": 1},
        )
        resp.raise_for_status()
        accounts = resp.json().get("data", [])
        return str(accounts[0]["id"]) if accounts else None
 async def get_platform_rules(page_size: int = 1000) -> list:
    """
    Fetch all Detection Library platform rules from /detection-library/platform-rules.
    Requires scopeLevel + scopeId — uses account scope with the first visible account.
    Returns list of rules, each with a 'sources' list (authoritative data source names).
    """
    account_id = await get_account_id()
    if not account_id:
        return []
    all_rules: list = []
    cursor: str = ""
    async with httpx.AsyncClient(timeout=60) as client:
        while True:
            params: dict = {
                "scopeLevel": "account",
                "scopeId": account_id,
                "limit": page_size,
                "cursor": cursor,
            }
            resp = await client.get(
                f"{BASE_URL}/web/api/v2.1/detection-library/platform-rules",
                headers=HEADERS,
                params=params,
            )
            if resp.status_code == 400:
                return []
            resp.raise_for_status()
            body = resp.json()
            all_rules.extend(body.get("data", []))
            cursor = body.get("pagination", {}).get("nextCursor") or ""
            if not cursor:
                break
    return all_rules
 async def get_sites() -> list:
    async with httpx.AsyncClient(timeout=30) as client:
        resp = await client.get(
@@ -17,12 +17,14 @@ services:
      - SDL_LOG_READ_KEY=${SDL_LOG_READ_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - DATABASE_URL=postgresql://siem:siem@db:5432/siem
      - DETECTIONS_FILE=/app/data/detections.json
    depends_on:
      db:
        condition: service_healthy
    volumes:
      - ./parsers:/app/parsers
      - ./.env:/app/.env
      - ./data:/app/data:ro
  db:
    image: postgres:16-alpine
@@ -116,17 +116,98 @@ function barChart(rows, labelKey, valueKey) {
 function renderHome() {
  set(`<div class="p-8 max-w-5xl">
-    <div class="mb-8">
+    <div class="mb-6">
      <h1 class="text-2xl font-bold text-white">SIEM Engineering Toolkit</h1>
      <p class="text-gray-400 mt-1">SentinelOne AI-SIEM · demo.sentinelone.net</p>
    </div>
-    <div class="grid grid-cols-1 md:grid-cols-3 gap-5">
+    <div id="home-stats" class="grid grid-cols-2 md:grid-cols-4 gap-4 mb-8">
-      ${homeCard('#/coverage','Parser Coverage Map','Cross-reference SDL parser fields against STAR and Sigma rule fields. Surface parsed-but-unused fields as reduction candidates.','Open Coverage Map','from-purple-700 to-purple-900')}
+      <div class="bg-gray-900 border border-gray-800 rounded-xl p-4 text-center animate-pulse">
-      ${homeCard('#/ingest','Ingest Dashboard','Visualize event volume by source and type. Project monthly GB costs and simulate exclusion filters before applying them.','Open Dashboard','from-blue-700 to-blue-900')}
+        <div class="h-7 w-16 bg-gray-800 rounded mx-auto mb-1"></div>
-      ${homeCard('#/quality','Parser Quality','Sample live events to see which fields landed. Measure field population rates and test parser patterns against raw log lines.','Open Quality Tools','from-amber-700 to-amber-900')}
+        <div class="h-3 w-20 bg-gray-800 rounded mx-auto"></div>
-      ${homeCard('#/onboarding','Onboarding Accelerator','Step-by-step guide for onboarding a new log source using Claude Code directly — no API key required.','View Guide','from-emerald-700 to-emerald-900')}
+      </div>
      <div class="bg-gray-900 border border-gray-800 rounded-xl p-4 text-center animate-pulse">
        <div class="h-7 w-16 bg-gray-800 rounded mx-auto mb-1"></div>
        <div class="h-3 w-20 bg-gray-800 rounded mx-auto"></div>
      </div>
      <div class="bg-gray-900 border border-gray-800 rounded-xl p-4 text-center animate-pulse">
        <div class="h-7 w-16 bg-gray-800 rounded mx-auto mb-1"></div>
        <div class="h-3 w-20 bg-gray-800 rounded mx-auto"></div>
      </div>
      <div class="bg-gray-900 border border-gray-800 rounded-xl p-4 text-center animate-pulse">
        <div class="h-7 w-16 bg-gray-800 rounded mx-auto mb-1"></div>
        <div class="h-3 w-20 bg-gray-800 rounded mx-auto"></div>
      </div>
    </div>
    <div id="home-uncovered" class="hidden mb-8"></div>
    <div class="grid grid-cols-1 md:grid-cols-2 gap-5">
      ${homeCard('#/coverage','Parser Coverage Map','See which active data sources have a parser running and which need one.','Open Coverage Map','from-purple-700 to-purple-900')}
      ${homeCard('#/ingest','Ingest Dashboard','Visualize event volume by source and type. Simulate exclusion filters before applying them.','Open Dashboard','from-blue-700 to-blue-900')}
      ${homeCard('#/quality','Parser Quality','Sample live events, measure field population rates, and test parser patterns against raw log lines.','Open Quality Tools','from-amber-700 to-amber-900')}
      ${homeCard('#/onboarding','Onboarding Accelerator','Step-by-step guide for onboarding a new log source using Claude Code directly.','View Guide','from-emerald-700 to-emerald-900')}
    </div>
  </div>`)
  homeLoadStats()
 }
 async function homeLoadStats() {
  try {
    const r = await apiGet('/api/coverage/map')
    const sources = r.sources || []
    const total = sources.length
    const covered = sources.filter(s => s.status === 'covered').length
    const needed = sources.filter(s => s.status === 'parser_needed').length
    const pct = total ? Math.round(covered / total * 100) : 0
    const pctColor = pct >= 80 ? 'text-emerald-400' : pct >= 50 ? 'text-amber-400' : 'text-red-400'
    document.getElementById('home-stats').innerHTML = `
      ${homeStat(pct + '%', 'Parser Coverage', pctColor)}
      ${homeStat(total.toLocaleString(), 'Active Sources', 'text-blue-400')}
      ${homeStat(covered.toLocaleString(), 'Covered', 'text-emerald-400')}
      ${homeStat(needed.toLocaleString(), 'Need Parser', needed > 0 ? 'text-red-400' : 'text-gray-500')}`
    // Top uncovered sources by volume
    const uncovered = sources
      .filter(s => s.status === 'parser_needed')
      .sort((a, b) => (b.event_count || 0) - (a.event_count || 0))
      .slice(0, 5)
    if (uncovered.length) {
      const rows = uncovered.map(s => `
        <tr class="border-b border-gray-800/50">
          <td class="py-2 pr-4 font-mono text-xs text-gray-200">
            <a href="#/quality" onclick="queueQualitySource('${esc(s.source_name)}')" class="hover:text-purple-400 cursor-pointer">${esc(s.source_name)}</a>
          </td>
          <td class="py-2 text-xs text-gray-400">${(s.event_count || 0).toLocaleString()} events</td>
        </tr>`).join('')
      document.getElementById('home-uncovered').classList.remove('hidden')
      document.getElementById('home-uncovered').innerHTML = `
        <div class="bg-gray-900 border border-red-900/40 rounded-xl p-5">
          <h2 class="text-sm font-semibold text-white mb-1">Top Sources Needing a Parser</h2>
          <p class="text-xs text-gray-500 mb-3">Highest-volume sources with no parser running — click to inspect in Parser Quality.</p>
          <table class="w-full">
            <thead><tr class="text-left text-gray-500 border-b border-gray-800">
              <th class="pb-2 pr-4 text-xs font-medium">Source</th>
              <th class="pb-2 text-xs font-medium">Volume</th>
            </tr></thead>
            <tbody>${rows}</tbody>
          </table>
        </div>`
    }
  } catch(e) {
    document.getElementById('home-stats').innerHTML = `
      ${homeStat('—', 'Parser Coverage', 'text-gray-600')}
      ${homeStat('—', 'Active Sources', 'text-gray-600')}
      ${homeStat('—', 'Covered', 'text-gray-600')}
      ${homeStat('—', 'Need Parser', 'text-gray-600')}`
  }
 }
 function homeStat(value, label, valueClass) {
  return `<div class="bg-gray-900 border border-gray-800 rounded-xl p-4 text-center">
    <div class="text-2xl font-bold ${valueClass} mb-1">${value}</div>
    <div class="text-xs text-gray-500">${label}</div>
  </div>`
 }
 function homeCard(href, title, desc, cta, grad) {
@@ -138,6 +219,12 @@ function homeCard(href, title, desc, cta, grad) {
  </div>`
 }
 // Queue a source to be pre-selected when Quality page loads
 let _pendingQualitySource = null
 function queueQualitySource(source) {
  _pendingQualitySource = source
 }
 // ── Coverage ──────────────────────────────────────────────────────────────
 let cvFilter = 'all', cvData = null
@@ -151,7 +238,7 @@ function renderCoverage() {
      </div>
      <div class="flex gap-2 flex-wrap justify-end">
        <button id="btn-sync" onclick="cvSyncSources()" class="px-3 py-1.5 text-sm bg-blue-700 hover:bg-blue-600 rounded-lg text-white">Sync Live Sources</button>
-        <button id="btn-star" onclick="loadStar()" class="px-3 py-1.5 text-sm bg-purple-700 hover:bg-purple-600 rounded-lg text-white">Load Library STAR Rules</button>
+        <button id="btn-sync-library" onclick="syncLibrary()" class="px-3 py-1.5 text-sm bg-blue-700 hover:bg-blue-600 rounded-lg text-white">Sync Detection Library</button>
        <button id="btn-sdl-parsers" onclick="loadSDLParsers()" class="px-3 py-1.5 text-sm bg-purple-700 hover:bg-purple-600 rounded-lg text-white">Load SDL Parsers</button>
        <button onclick="document.getElementById('f-parser').click()" class="px-3 py-1.5 text-sm bg-gray-700 hover:bg-gray-600 rounded-lg text-white">Upload Parser</button>
        <button onclick="cvReset()" class="px-3 py-1.5 text-sm bg-red-900/60 hover:bg-red-800 rounded-lg text-red-300">Reset</button>
@@ -166,28 +253,51 @@ function renderCoverage() {
  cvLoad()
 }
-async function loadSDLParsers() {
+async function syncLibrary() {
-  setBtn('btn-sdl-parsers', true)
+  setBtn('btn-sync-library', true)
-  document.getElementById('cv-err').innerHTML = ''
+  const errEl = document.getElementById('cv-err')
  if (errEl) errEl.innerHTML = ''
  try {
-    const res = await apiPost('/api/coverage/load-parsers-from-sdl', {})
+    const r = await apiPost('/api/coverage/load-detections', {})
-    if (res.errors?.length) {
+    if (errEl) {
-      document.getElementById('cv-err').innerHTML = errBox(`${res.errors.length} parser(s) failed to load: ${res.errors.map(e=>e.parser).join(', ')}`)
+      errEl.innerHTML = `<div class="p-3 bg-emerald-900/40 border border-emerald-700 rounded-lg text-sm text-emerald-300 mb-4">✓ ${r.loaded} detection rules synced from ${r.source === 'api' ? 'S1 API' : 'local file'}</div>`
      setTimeout(() => { errEl.innerHTML = '' }, 4000)
    }
    cvLoad()
  } catch(e) {
-    document.getElementById('cv-err').innerHTML = errBox(e.message)
+    if (errEl) errEl.innerHTML = errBox(e.message)
  } finally { setBtn('btn-sync-library', false, 'Sync Detection Library') }
 }
 async function loadSDLParsers() {
  setBtn('btn-sdl-parsers', true)
  const errEl = document.getElementById('cv-err')
  if (errEl) errEl.innerHTML = ''
  try {
    const res = await apiPost('/api/coverage/load-parsers-from-sdl', {})
    let msg = `✓ ${res.loaded} parser${res.loaded !== 1 ? 's' : ''} loaded`
    if (res.errors?.length) {
      msg += ` — ${res.errors.length} failed: ${res.errors.map(e=>e.parser).join(', ')}`
      if (errEl) errEl.innerHTML = errBox(msg)
    } else {
      if (errEl) errEl.innerHTML = `<div class="p-3 bg-emerald-900/40 border border-emerald-700 rounded-lg text-sm text-emerald-300 mb-4">${msg}</div>`
      setTimeout(() => { if (errEl) errEl.innerHTML = '' }, 4000)
    }
    cvLoad()
  } catch(e) {
    if (errEl) errEl.innerHTML = errBox(e.message)
  } finally {
    setBtn('btn-sdl-parsers', false, 'Load SDL Parsers')
  }
 }
-async function loadStar() {
+
-  setBtn('btn-star', true)
+function cvToggleMissing(id) {
-  document.getElementById('cv-err').innerHTML = ''
+  const el = document.getElementById(id)
-  try { await apiPost('/api/coverage/load-star-rules', {}); cvLoad() }
+  const chevron = document.getElementById(id + '-chevron')
-  catch(e) { document.getElementById('cv-err').innerHTML = errBox(e.message) }
+  if (!el) return
-  finally { setBtn('btn-star', false, 'Load Library STAR Rules') }
+  const open = el.classList.toggle('hidden')
  if (chevron) chevron.textContent = open ? '▶' : '▼'
 }
 async function cvUploadSigma(files) {
@@ -236,7 +346,7 @@ async function cvLoad() {
      document.getElementById('cv-table').innerHTML = `
        <div class="bg-gray-900/50 border border-gray-800 rounded-lg p-6 text-center text-sm text-gray-500">
          <p class="mb-2">No active sources synced yet.</p>
-          <p>Click <strong class="text-gray-300">Sync Live Sources</strong> to pull current dataSource.names from the data lake, then <strong class="text-gray-300">Load STAR Rules</strong> and <strong class="text-gray-300">Load SDL Parsers</strong> to see coverage.</p>
+          <p>Click <strong class="text-gray-300">Sync Live Sources</strong> to pull current dataSource.names from the data lake, then <strong class="text-gray-300">Load SDL Parsers</strong> to see coverage.</p>
        </div>`
      return
    }
@@ -286,24 +396,39 @@ function cvSetFilter(f) {
        ? `<span class="text-emerald-600 text-xs">✓ All fields covered</span>`
        : `<span class="text-gray-700 text-xs">—</span>`
    }
    const id = 'mf-' + s.source_name.replace(/[^a-z0-9]/gi, '_')
    const chips = s.missing_fields.map(f =>
      `<span class="px-1.5 py-0.5 bg-red-900/40 border border-red-800/60 rounded text-xs font-mono text-red-300">${esc(f)}</span>`
    ).join(' ')
-    return `<div class="flex flex-wrap gap-1">${chips}</div>`
+    return `<div>
      <button onclick="cvToggleMissing('${id}')"
        class="flex items-center gap-1.5 text-xs text-red-400 hover:text-red-300 transition-colors">
        <span class="px-1.5 py-0.5 bg-red-900/40 border border-red-800/60 rounded font-semibold">${s.missing_fields.length}</span>
        <span>field${s.missing_fields.length !== 1 ? 's' : ''} missing</span>
        <span id="${id}-chevron" class="text-gray-600">▶</span>
      </button>
      <div id="${id}" class="hidden mt-1.5 flex flex-wrap gap-1">${chips}</div>
    </div>`
  }
  function detectionsCell(s) {
    if (s.rule_count) {
      return `<span class="text-purple-400 font-medium">${s.rule_count}</span> rule${s.rule_count !== 1 ? 's' : ''}`
    }
    if (s.close_matches && s.close_matches.length) {
      const hints = s.close_matches.map(m =>
        `<span class="text-amber-400">${esc(m.library_name)}</span> <span class="text-gray-600">(${m.rule_count} rules)</span>`
      ).join(', ')
      return `<span class="text-gray-700">—</span> <span class="text-amber-600 text-xs" title="dataSource.name mismatch?">⚠ similar: ${hints}</span>`
    }
    return `<span class="text-gray-700">—</span>`
  }
  function parserCell(s) {
    if (s.status === 'covered') {
-      if (s.parser === 'detected in data') {
+      return `<span class="text-emerald-400 font-medium">✓ Parsed</span>`
        return `<span class="text-emerald-400">✓ Parsed <span class="text-emerald-700">(${(s.parser_detected||0).toLocaleString()} typed events detected)</span></span>`
      }
      const detail = s.parser_fields ? ` (${s.parser_fields} fields)` : ''
      return `<span class="text-gray-400">${esc(s.parser)}${detail}</span>`
    }
-    if (s.parser && s.format_type && s.format_type !== 'custom') {
+    return `<span class="text-red-400 font-medium">✗ Not Parsed</span>`
      return `<span class="text-amber-400 italic">⚠ ${esc(s.parser)} <span class="text-amber-600">(${esc(s.format_type)} — needs custom parser)</span></span>`
    }
    return `<span class="text-red-400 italic">⚠ No parser loaded</span>`
  }
  document.getElementById('cv-table').innerHTML = sources.length === 0
@@ -314,16 +439,20 @@ function cvSetFilter(f) {
          <th class="pb-2 pr-4 font-medium">Events (7d)</th>
          <th class="pb-2 pr-4 font-medium">Status</th>
          <th class="pb-2 pr-4 font-medium">Parser</th>
-          <th class="pb-2 pr-4 font-medium">STAR Rules</th>
+          <th class="pb-2 pr-4 font-medium">Detections</th>
-          <th class="pb-2 font-medium">Detection Fields Missing</th>
+          <th class="pb-2 font-medium">Fields Missing</th>
        </tr></thead>
        <tbody>${sources.map(s => `
          <tr class="border-b border-gray-800/50 hover:bg-gray-900/30">
-            <td class="py-2 pr-4 font-mono text-xs text-gray-200">${esc(s.source_name)}</td>
+            <td class="py-2 pr-4 font-mono text-xs">
              <a href="#/quality" onclick="queueQualitySource('${esc(s.source_name)}')"
                class="text-gray-200 hover:text-purple-400 cursor-pointer transition-colors"
                title="Open in Parser Quality">${esc(s.source_name)}</a>
            </td>
            <td class="py-2 pr-4 text-xs text-gray-400">${(s.event_count||0).toLocaleString()}</td>
            <td class="py-2 pr-4"><span class="px-2 py-0.5 rounded text-xs border ${STYLES[s.status]||''}">${LABELS[s.status]||s.status}</span></td>
            <td class="py-2 pr-4 text-xs">${parserCell(s)}</td>
-            <td class="py-2 pr-4 text-xs text-gray-400">${s.rules?.length ? s.rules.map(r=>esc(r.rule)).join(', ') : '—'}</td>
+            <td class="py-2 pr-4 text-xs text-gray-400">${detectionsCell(s)}</td>
            <td class="py-2 text-xs">${missingFieldsCell(s)}</td>
          </tr>`).join('')}
        </tbody></table></div>`
@@ -749,7 +878,17 @@ function renderQuality() {
      <div id="qt-result"></div>
    </div>
  </div>`)
-  qtLoadParsers()
+  qtLoadParsers().then(() => {
    // Pre-select source if navigated from Coverage Map or Overview
    if (_pendingQualitySource) {
      const src = _pendingQualitySource
      _pendingQualitySource = null
      const qsSel = document.getElementById('qs-source')
      const qpSel = document.getElementById('qp-source')
      if (qsSel) qsSel.value = src
      if (qpSel) { qpSel.value = src; qpDiscoverFields() }
    }
  })
 }
 // ── Live Event Sampler ─────────────────────────────────────────────────────