9.2 KiB
Two Kusto performance cliffs explained
Companion deep-dive to MPP_vs_KQL.md §6. Two phrases in
the annotated KQL block point at real, well-known performance cliffs that
deserve their own explanation rather than a footnote:
has_any on ProcessCommandLine bypasses the term indexhint.shufflekey is required to avoid OOM on the cross-table join
1. has_any on ProcessCommandLine bypasses the term index
What the term index actually does
Kusto builds a per-shard inverted term index on string columns. At
ingest time each string value is tokenized into "terms" using a fixed
tokenizer that splits on non-alphanumeric ASCII (whitespace,
punctuation, \, /, ., -, etc.) and lowercases. The resulting
tokens are written to the shard's inverted index alongside the columnar
data.
When you write where Col has "x", Kusto:
- Tokenizes
"x"the same way the indexer did at ingest. - Looks up the resulting term in the shard's inverted index.
- Reads only the rows in shards whose index says "this term might be present here" — entire shards get skipped.
This is the difference between a 50 ms hunt and a 5-minute one.
Why has_any on ProcessCommandLine falls out of that fast path
Three independent reasons compound:
a) The needle contains characters that the tokenizer treats as separators.
ProcessCommandLine values look like:
"C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" -nop -w hidden -enc JABzAD0A...
If you write has_any ("powershell.exe", "rundll32.exe") you're not
searching for one token — powershell.exe is the two tokens
powershell and exe joined by a . (a separator). The index never
stored powershell.exe as a single term, so the lookup misses and Kusto
falls back to a row scan.
Quick fix: search for the bare token (has_any ("powershell", "rundll32"))
— Kusto's planner will index-prune on each individual token. But
analysts almost never write it that way because they're thinking "the
binary is powershell.exe."
b) has_any blows past the term-index cardinality threshold.
For each candidate term in the has_any list, Kusto has to consult the
inverted index, accumulate row-id sets, then union them. The query
optimizer has an internal threshold: above some number of needles (or
above some estimated selectivity), it gives up on index lookups and
just scans, because the OR-merge of many indexed lookups costs more
than the scan would.
The exact threshold is undocumented and changes between versions;
empirically it kicks in fast on ProcessCommandLine because that column
has the highest term cardinality in the schema — most of those terms
are unique GUIDs, paths, base64 blobs, hashes, etc. — so the inverted
index is huge and per-term lookup is expensive.
c) ProcessCommandLine itself blows up the indexer's effectiveness.
Even when you do hit the index, the selectivity is terrible. A term
like powershell matches a large fraction of all process-creation rows
on a typical workstation fleet. The index tells Kusto "this shard might
contain it" — but every shard does contain it, so no shards get
pruned. You still scan everything.
This is the deepest reason of the three: even a perfectly written,
single-token, indexed has query on ProcessCommandLine gives you the
index path's CPU cost on top of the scan you were going to do anyway.
The escape hatch most people don't know about
If you must do this in KQL, push the substring match into a where
clause that the planner can convert into a true scan-with-early-exit,
and pre-narrow with something that is selective:
SecurityEvent
| where TimeGenerated > ago(1h) // narrow time first — selective
| where EventID == 4688 // narrow event-id — selective
| where ProcessCommandLine matches regex @"(?i)powershell|rundll32|mshta"
Two hours of data and an EventID filter is usually enough that the
scan-after-prune is cheap. At 90 days with no narrowing predicate,
you've lost. That's the cliff the doc refers to.
What SDL does instead
No index, so no "did I hit the index?" cliff. Every query is a columnar
scan over the epochs that overlap the time window, with column-level
prefix pruning and run-length compression doing the heavy lifting.
matches "(powershell|rundll32|mshta)" on src.process.cmdline at 90
days is the same code path as at 1 hour — just more epochs in parallel.
2. hint.shufflekey is required to avoid OOM on the cross-table join
How Kusto's distributed join works by default
Kusto is distributed. Tables are split into extents (shards), and extents live on different data nodes. When you write:
A | join kind=inner B on Key
Kusto picks one of two physical strategies:
- Broadcast (the default for "small × large"): take the smaller side, replicate it to every node holding the larger side, then do local hash joins. Fast when small really is small.
- Shuffle: hash both sides on
Key, send all rows with hash bucket i to node i, then do local hash joins. Needed when both sides are big.
The planner chooses based on a statistics estimate of how big each
side is after the upstream where filters apply.
Where the OOM comes from
For a 90-day cross-source hunt the planner's estimate is almost always wrong:
bad_dnsafter thehas_any (suspect_domains)filter is probably small — but if the IOC list has 200 entries or a wildcard sneaks in, it can be millions of rows.- The planner picks broadcast because it estimates
bad_dnsis small. - At runtime,
bad_dnsturns out to be huge. - Kusto tries to ship the entire
bad_dnspayload to every node holding_Im_ProcessEventextents (which at 90 d is every node). - Each node tries to hold the broadcasted copy in memory while
streaming
_Im_ProcessEventpast it. MaxMemoryPerQueryPerNode(a tenant-level resource governor knob; on Sentinel it's a shared, opaque value) gets hit.- You get
Request was aborted due to exceeding query memory limits— or worse, partial results with no warning.
The same shape, with a left side just under the broadcast limit, OOMs intermittently as data volume grows day to day. That's the "silent regression" that makes 90-day Sentinel hunts unreliable in production.
What hint.shufflekey does
A | join kind=inner hint.strategy=shuffle hint.shufflekey=Key (B) on Key
You override the planner's estimate and force the shuffle strategy on
Key. Both sides get re-hashed by Key, sent across the network into
hash buckets, joined locally. No node has to hold all of either side —
each node holds only its bucket's slice, so memory grows linearly
with cluster size instead of quadratically with data size.
Why this is a footgun
- You have to know to use it. The default broadcast looks fine on a 1-day window and silently breaks at 90 days.
- You have to pick the right key. If
hint.shufflekeyis something with high skew (e.g. one user has 95% of the events), one node still OOMs while the others sit idle. You'd then addhint.num_partitions=Nand tune it. Production hunts often have 3+ hints stacked just to keep them stable. - You can't compose well. Two joins in the same query each need their own carefully chosen shuffle key. Get one wrong and the second join breaks.
- The hints are advisory, not contractual. A future Kusto version may ignore your hint if its own cost model thinks broadcast is better. Sentinel's update cadence means a query stable today can regress on a Tuesday with no warning.
What SDL does instead
The reduction is distributed by construction — there is no
broadcast vs shuffle planner because the engine never moves un-reduced
rows across the network. Each worker filters → projects →
partial-aggregates → partial-joins on its local epochs and sends only
the reduced state up to the coordinator. The 90-day join in the SDL
example in MPP_vs_KQL.md §6 needs zero hints:
the joiner doesn't need to estimate sizes because it never decides to
broadcast.
TL;DR sentences (drop-in for sidebars)
has_anycliff. Kusto's term index tokenizes string columns at ingest.has_anyonProcessCommandLinedefeats it three ways at once: the needles often contain separator characters (so the indexed terms don't match), the OR-merge of many needles exceeds the planner's index-vs-scan threshold, andProcessCommandLinehas such a long-tail term distribution that index lookups rarely prune shards anyway. At 90 d the scan that results is the largest single column scan in the workspace.
hint.shufflekeycliff. Kusto's join planner picks broadcast vs shuffle from an estimated cardinality. On a 90-day cross-source hunt the estimate is almost always wrong, the planner picks broadcast, and the smaller side turns out to be tens of millions of rows. Withouthint.strategy=shuffle hint.shufflekey=...the query OOMs againstMaxMemoryPerQueryPerNode. The hint is required for stability and has to be re-tuned per query and per data-volume change — a maintenance tax SDL's distributed-reduction engine doesn't impose.