gpt-5-mini-high vs o1-low KQL Benchmark

o1-low wins by 14.9%

Compared on 188 shared test questions

Overall Accuracy

gpt-5-mini-high

48.4%

91 / 188 correct

o1-low

63.3%

119 / 188 correct

Average Cost per Query

gpt-5-mini-high: $0.0150
o1-low: $0.4994
o1-low costs 3233.3% more

Average Execution Time

gpt-5-mini-high: 44.83s
o1-low: 50.90s
o1-low takes 13.5% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1021.006
On Windows hosts, look through recent PowerShell execution records to find any elevated session where remote management was turned on. What exact command was run to enable PSRemoting?
gpt-5-mini-high Wins
T1057
On a Windows host, investigate process events to find when Task Manager was launched via cmd with an unusual flag. What was the full command executed?
gpt-5-mini-high Wins
T1070.003
On a Windows endpoint, review process execution logs to see if any PowerShell sessions were wiped clean. Which command was executed to clear the PowerShell history?
gpt-5-mini-high Wins
T1082
On Windows systems, identify when the built-in Shadow Copy utility is used to enumerate existing snapshots. What was the full command executed?
gpt-5-mini-high Wins
T1082
A Windows system shows a cmd.exe process spawn that appears to have been used for environment discovery. Review the process creation records to identify the exact command the adversary ran to enumerate environment variables.
gpt-5-mini-high Wins
T1112
A Windows user’s registry was altered via a command-line tool to disable the lock workstation feature by adding a DWORD entry under the current user Policies\System key. Which registry value name was modified in this operation?
gpt-5-mini-high Wins
T1120
Review Windows process and PowerShell activity for commands that enumerate PnP entities through WMI. Which PowerShell cmdlet was invoked to perform this hardware inventory?
gpt-5-mini-high Wins
T1217
On a Windows system, you notice a process that recursively enumerates files named 'Bookmarks' under every user profile directory. Which Windows command-line utility was used to perform that search?
gpt-5-mini-high Wins
T1547.002
A Windows host shows a suspicious registry change under the LSA hive. Review recent registry events to locate any new entries under Authentication Packages and determine the name of the DLL the attacker added.
gpt-5-mini-high Wins
T1547
A Windows host shows a process launching with install-driver switches, likely signaling malicious driver deployment. What is the name of the tool that was executed?
gpt-5-mini-high Wins
T1548.001
A suspicious elevated shell on Linux changed a file’s permissions for a user to include the SetUID bit. What was the exact command used to set that flag?
gpt-5-mini-high Wins
T1555
A security investigator suspects that someone attempted to dump stored web credentials on a Windows system using an in-built command-line tool. Review process creation logs to determine which executable was called to list the Web Credentials vault.
gpt-5-mini-high Wins
T1555
On a Windows endpoint, you find PowerShell reaching out to a remote URL and then running a module command. What was the LaZagne module name that was executed?
gpt-5-mini-high Wins
T1557.001
On Windows devices, hunt for PowerShell activity where a remote script is fetched and executed to perform LLMNR/NBNS spoofing. Which cmdlet kicked off the listener?
gpt-5-mini-high Wins
T1562
Review Linux process execution logs to find where the system journal service was stopped. Which utility was invoked to disable journal logging?
gpt-5-mini-high Wins
T1003.007
On a Linux system, review process creation logs to spot any dd commands reading from /proc/*/mem. What was the name of the file where the dumped memory was written?
o1-low Wins
T1016.001
On a Linux host, a ping command was executed to test internet connectivity. Determine which IP address was used as the ping target.
o1-low Wins
T1027
A Windows host shows a process launch with an extremely obfuscated command line that dynamically builds and invokes code at runtime. Which process name was used to execute this payload?
o1-low Wins
T1003.001
Using Windows process event logs, investigate PowerShell activity around lsass.exe memory capture. What was the name of the script file invoked to perform the dump?
o1-low Wins
T1027
On a Windows endpoint, look for evidence of a base64-encoded PowerShell payload execution. Which executable launched the encoded command?
o1-low Wins
T1036.004
A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked?
o1-low Wins
T1036.003
A process is running under a familiar Windows host name but originates from a user's AppData folder rather than the System32 directory. Identify the filename used to masquerade the PowerShell binary on this Windows device.
o1-low Wins
T1048.003
A Linux host briefly hosted an HTTP service under /tmp. Examine process creation logs to determine the exact python3 command that was used to start the server on port 9090.
o1-low Wins
T1049
A user launched a Windows command prompt and executed a built-in utility to enumerate all active network connections. Using process creation logs, identify the exact tool that produced the list of current connections.
o1-low Wins
T1048.003
Windows process creation logs show a PowerShell-driven file transfer to an FTP service. Which executable was leveraged to perform this exfiltration?
o1-low Wins
Page 1 of 8

Explore individual model performance and detailed analysis