// kql benchmark

Which model writes the best KQL?

We score frontier models on 188 real natural-language threat-hunting prompts — measuring detection accuracy against cost and latency. The leaderboard below is the result.

| benchmark.kql14 models · 188 questions
Benchmarks
| where task == "natural-language → KQL"
| summarize accuracy, cost, latency by model
| order by accuracy desc
▸ resultsordered by accuracy