gpt-5-nano-high vs o4-mini-low KQL Benchmark
o4-mini-low wins by 12.8%
Compared on 188 shared test questions
Overall Accuracy
gpt-5-nano-high
30.3%
57 / 188 correct
o4-mini-low
43.1%
81 / 188 correct
Average Cost per Query
gpt-5-nano-high: $0.0069
o4-mini-low: $0.0311
o4-mini-low costs 351.1% more
Average Execution Time
gpt-5-nano-high: 61.10s
o4-mini-low: 73.44s
o4-mini-low takes 20.2% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 188 questions
Page 1 of 8
T1003 | On a Windows host, an attacker leveraged COMSVCS.DLL via rundll32.exe to dump the memory of svchost.exe into a file under the Temp directory. Review the file creation logs to determine the exact name of the dump file that was generated. | ✓ | ✗ | gpt-5-nano-high Wins |
T1027 | On a Windows endpoint, look for evidence of a base64-encoded PowerShell payload execution. Which executable launched the encoded command? | ✓ | ✗ | gpt-5-nano-high Wins |
T1016.001 | On a Linux host, a ping command was executed to test internet connectivity. Determine which IP address was used as the ping target. | ✓ | ✗ | gpt-5-nano-high Wins |
T1036.003 | In a Linux environment, you observe a process labeled like the cron daemon but running from an unexpected path. Investigate creation events to uncover the actual filename used by this fake cron process. | ✓ | ✗ | gpt-5-nano-high Wins |
T1057 | On a Windows host, investigate process events to find when Task Manager was launched via cmd with an unusual flag. What was the full command executed? | ✓ | ✗ | gpt-5-nano-high Wins |
T1059.004 | During a Linux investigation, you notice processes spawning curl and wget commands that pull a script from a remote GitHub raw URL and pipe it into bash. Identify the name of the script that was retrieved and executed. | ✓ | ✗ | gpt-5-nano-high Wins |
T1069.001 | Review recent Windows process event logs for PowerShell activity that suggests local group enumeration through WMI. What exact command was executed? | ✓ | ✗ | gpt-5-nano-high Wins |
T1059.004 | An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell. | ✓ | ✗ | gpt-5-nano-high Wins |
T1070.004 | A Linux host executed a native utility to overwrite and then remove a temporary file in one step. Identify the name of the file that was securely deleted by this action. | ✓ | ✗ | gpt-5-nano-high Wins |
T1082 | A Windows system shows a cmd.exe process spawn that appears to have been used for environment discovery. Review the process creation records to identify the exact command the adversary ran to enumerate environment variables. | ✓ | ✗ | gpt-5-nano-high Wins |
T1082 | A user‐space process on a Linux device invoked a shell to capture and display the system’s environment variables and path. Which exact command was used to perform this discovery? | ✓ | ✗ | gpt-5-nano-high Wins |
T1124 | A Windows host recorded a process that simply executes the system’s native time utility. Without spelling out the query, determine which command was run based on process creation events. | ✓ | ✗ | gpt-5-nano-high Wins |
T1134.001 | A Windows host logs show PowerShell fetching and executing a remote script to gain SeDebugPrivilege token duplication. Which Empire module was invoked? | ✓ | ✗ | gpt-5-nano-high Wins |
T1547.014 | A Windows endpoint shows an Active Setup entry under Internet Explorer Core Fonts being altered with a StubPath value. Investigate the registry events and identify the payload that was set. | ✓ | ✗ | gpt-5-nano-high Wins |
T1552.003 | A Linux user’s bash history was searched for patterns like ‘pass’ and ‘ssh’, and the matching lines were redirected into a new file. Determine the name of that file. | ✓ | ✗ | gpt-5-nano-high Wins |
T1562.003 | Review Windows registry event logs for the ProcessCreationIncludeCmdLine_Enabled value being set to 0. Which PowerShell cmdlet performed this change? | ✓ | ✗ | gpt-5-nano-high Wins |
T1555 | A security investigator suspects that someone attempted to dump stored web credentials on a Windows system using an in-built command-line tool. Review process creation logs to determine which executable was called to list the Web Credentials vault. | ✓ | ✗ | gpt-5-nano-high Wins |
T1559 | Investigating a Windows device, you suspect a non-standard executable was launched to set up a named pipe for client-server messaging. Determine the name of the executable that was run. | ✓ | ✗ | gpt-5-nano-high Wins |
T1562.004 | On a Windows device, a new inbound firewall rule was created unexpectedly. Review process execution records to identify the command-line utility responsible for adding the rule. | ✓ | ✗ | gpt-5-nano-high Wins |
T1562.012 | A Linux system’s audit framework appears to have been reset unexpectedly. Search your process execution records to identify which exact invocation removed all auditd rules. What full command was executed? | ✓ | ✗ | gpt-5-nano-high Wins |
T1562 | Review Linux process execution logs to find where the system journal service was stopped. Which utility was invoked to disable journal logging? | ✓ | ✗ | gpt-5-nano-high Wins |
T1564.002 | On Windows systems, identify any user account that was hidden by setting its value to 0 under the SpecialAccounts\\UserList registry key. What was the name of the hidden account? | ✓ | ✗ | gpt-5-nano-high Wins |
T1036.003 | A process is running under a familiar Windows host name but originates from a user's AppData folder rather than the System32 directory. Identify the filename used to masquerade the PowerShell binary on this Windows device. | ✗ | ✓ | o4-mini-low Wins |
T1003.008 | In a Linux environment, an elevated process was used to execute a command that read /etc/shadow and redirected its output to a file. Identify what file name was employed to store these results. | ✗ | ✓ | o4-mini-low Wins |
T1006 | Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path. | ✗ | ✓ | o4-mini-low Wins |
Page 1 of 8
Explore individual model performance and detailed analysis