gpt-5-mini-medium vs o1-low KQL Benchmark

o1-low wins by 17.6%

Compared on 187 shared test questions

Overall Accuracy

gpt-5-mini-medium

45.5%

85 / 187 correct

o1-low

63.1%

118 / 187 correct

Average Cost per Query

gpt-5-mini-medium: $0.0150
o1-low: $0.4994
o1-low costs 3223.5% more

Average Execution Time

gpt-5-mini-medium: 47.16s
o1-low: 50.90s
o1-low takes 7.9% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 187 questions
Page 1 of 8
T1057
On a Windows host, investigate process events to find when Task Manager was launched via cmd with an unusual flag. What was the full command executed?
gpt-5-mini-medium Wins
T1053.005
Investigate Windows process events for PowerShell activity that leverages WMI to register a scheduled task via XML import. What was the name of the XML file supplied to the RegisterByXml method?
gpt-5-mini-medium Wins
T1069.001
Investigate Windows process execution logs for a PowerShell cmdlet used to list group members. Look for entries where a group name is provided after a '-Name' flag and identify which group was queried.
gpt-5-mini-medium Wins
T1070.003
On a Linux endpoint, you suspect malicious clearing of the bash history by redirecting from the null device. Explore process or file events to uncover the exact shell command that performed this action.
gpt-5-mini-medium Wins
T1082
On Windows systems, identify when the built-in Shadow Copy utility is used to enumerate existing snapshots. What was the full command executed?
gpt-5-mini-medium Wins
T1057
While reviewing Windows process events, you spot a PowerShell process executing a WMI enumeration cmdlet. What WMI class name did the attacker query?
gpt-5-mini-medium Wins
T1120
Review Windows process and PowerShell activity for commands that enumerate PnP entities through WMI. Which PowerShell cmdlet was invoked to perform this hardware inventory?
gpt-5-mini-medium Wins
T1547.014
A Windows endpoint shows an Active Setup entry under Internet Explorer Core Fonts being altered with a StubPath value. Investigate the registry events and identify the payload that was set.
gpt-5-mini-medium Wins
T1547
A Windows host shows a process launching with install-driver switches, likely signaling malicious driver deployment. What is the name of the tool that was executed?
gpt-5-mini-medium Wins
T1217
On a Windows system, you notice a process that recursively enumerates files named 'Bookmarks' under every user profile directory. Which Windows command-line utility was used to perform that search?
gpt-5-mini-medium Wins
T1546.004
A suspicious file modification on a Linux device targeted the ~/.bash_profile file, apparently adding a new line. What was the full command string that was appended?
gpt-5-mini-medium Wins
T1547.002
A Windows host shows a suspicious registry change under the LSA hive. Review recent registry events to locate any new entries under Authentication Packages and determine the name of the DLL the attacker added.
gpt-5-mini-medium Wins
T1548.001
A Linux system shows a shell invocation that appears to be searching for files with elevated group permissions. Using the available process execution logs, determine exactly what command was run.
gpt-5-mini-medium Wins
T1562.006
A .NET tracing environment variable was turned off in a user’s registry on a Windows system. Which built-in command-line tool was used to make this registry change?
gpt-5-mini-medium Wins
T1562
Review Linux process execution logs to find where the system journal service was stopped. Which utility was invoked to disable journal logging?
gpt-5-mini-medium Wins
T1557.001
On Windows devices, hunt for PowerShell activity where a remote script is fetched and executed to perform LLMNR/NBNS spoofing. Which cmdlet kicked off the listener?
gpt-5-mini-medium Wins
T1622
On the Windows device, a security check was run to detect debugger processes via PowerShell. Which tool (process) carried out this check?
gpt-5-mini-medium Wins
T1006
Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path.
o1-low Wins
T1049
A user launched a Windows command prompt and executed a built-in utility to enumerate all active network connections. Using process creation logs, identify the exact tool that produced the list of current connections.
o1-low Wins
T1036.003
A process is running under a familiar Windows host name but originates from a user's AppData folder rather than the System32 directory. Identify the filename used to masquerade the PowerShell binary on this Windows device.
o1-low Wins
T1007
An analyst suspects a user or script ran a service enumeration command on a Linux system. Review process events to find the service-listing invocation and specify the full command that was executed.
o1-low Wins
T1053.003
Linux hosts may log events when new files are added to /var/spool/cron/crontabs. Query those logs for a creation or write action in that directory and determine the file name that was added.
o1-low Wins
T1018
On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache?
o1-low Wins
T1053.006
Examine the logs from the Linux system for events related to the systemd timer activation. Identify any records indicating that a new timer unit was started and enabled, and determine which timer name was used.
o1-low Wins
T1057
A Windows endpoint recorded a command-line activity through cmd.exe that lists all running processes. Determine which built-in tool was executed to perform this action.
o1-low Wins
Page 1 of 8

Explore individual model performance and detailed analysis