// kql benchmark
Which model writes the best KQL?
We score frontier models on 188 real natural-language threat-hunting prompts — measuring detection accuracy against cost and latency. The leaderboard below is the result.
| benchmark.kql14 models · 188 questions
Benchmarks| where task == "natural-language → KQL"| summarize accuracy, cost, latency by model| order by accuracy desc▸ resultsordered by accuracy
// leaderboard
// accuracy vs. cost
Up and to the left is the sweet spot — high detection accuracy for less spend. Cost uses a log scale.
// accuracy over time
How model accuracy on KQL has tracked with release date.
// go deeper
See how the benchmark is built
Read the methodology behind the scores, or browse the full set of natural-language threat-hunting scenarios models are tested on.