gpt-5-high vs o3-high KQL Benchmark
gpt-5-high wins by 8.5%
Compared on 188 shared test questions
Overall Accuracy
gpt-5-high
63.3%
119 / 188 correct
o3-high
54.8%
103 / 188 correct
Average Cost per Query
gpt-5-high: $0.1529
o3-high: $0.0632
gpt-5-high costs 142.0% more
Average Execution Time
gpt-5-high: 192.47s
o3-high: 78.68s
gpt-5-high takes 144.6% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 188 questions
Page 1 of 8
T1016.001 | On a Linux host, a ping command was executed to test internet connectivity. Determine which IP address was used as the ping target. | ✓ | ✗ | gpt-5-high Wins |
T1016.001 | An analyst notices a PowerShell process on a Windows host that appears to be checking SMB connectivity. Which PowerShell cmdlet was executed to perform this outbound port 445 test? | ✓ | ✗ | gpt-5-high Wins |
T1053.006 | Examine the logs from the Linux system for events related to the systemd timer activation. Identify any records indicating that a new timer unit was started and enabled, and determine which timer name was used. | ✓ | ✗ | gpt-5-high Wins |
T1036.003 | In a Linux environment, you observe a process labeled like the cron daemon but running from an unexpected path. Investigate creation events to uncover the actual filename used by this fake cron process. | ✓ | ✗ | gpt-5-high Wins |
T1069.001 | Review recent Windows process event logs for PowerShell activity that suggests local group enumeration through WMI. What exact command was executed? | ✓ | ✗ | gpt-5-high Wins |
T1059.004 | An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell. | ✓ | ✗ | gpt-5-high Wins |
T1069.001 | Investigate Windows process execution logs for a PowerShell cmdlet used to list group members. Look for entries where a group name is provided after a '-Name' flag and identify which group was queried. | ✓ | ✗ | gpt-5-high Wins |
T1059.004 | On a Linux system, analyze the process logs for suspicious command line activity that includes a sequence of commands indicating a pipe-to-shell operation. Identify the tool that was used to execute this piped command, paying special attention to its use in downloading and running script content. | ✓ | ✗ | gpt-5-high Wins |
T1070.003 | On a Windows device, there’s evidence that PowerShell history was wiped by deleting the history file. What was the exact command used to perform this action? | ✓ | ✗ | gpt-5-high Wins |
T1082 | On Windows systems, identify when the built-in Shadow Copy utility is used to enumerate existing snapshots. What was the full command executed? | ✓ | ✗ | gpt-5-high Wins |
T1082 | Review Windows process logs to find which built-in command was executed to reveal the system’s hostname. | ✓ | ✗ | gpt-5-high Wins |
T1070.008 | An attacker on Linux used bash to copy all files from /var/spool/mail into a newly created subdirectory before modifying them. What is the name of that subdirectory? | ✓ | ✗ | gpt-5-high Wins |
T1112 | Evidence shows that the Windows Defender startup entry was tampered with via an elevated command prompt. Investigate registry events related to the Run key to discover which executable replaced the default SecurityHealth value. What is the name of the new program? | ✓ | ✗ | gpt-5-high Wins |
T1082 | Windows: Investigate PowerShell process events for instances where a web client fetched and executed an external host-survey tool. What was the name of the script file that was downloaded and run? | ✓ | ✗ | gpt-5-high Wins |
T1112 | On a Windows device, examine registry event logs for modifications under the System policies path. Determine which registry value name was altered to disable the shutdown button at login. | ✓ | ✗ | gpt-5-high Wins |
T1112 | On a Windows endpoint, review the registry write events to spot when the WDigest key is altered to permit plaintext credential storage. What registry value name was changed? | ✓ | ✗ | gpt-5-high Wins |
T1112 | A Windows host logs a change to the Terminal Server registry key disabling single-session per user. Which command-line utility executed this registry modification? | ✓ | ✗ | gpt-5-high Wins |
T1112 | Investigate Windows registry events to identify any newly set ProxyServer entry under the user Internet Settings hive. What proxy server address was configured? | ✓ | ✗ | gpt-5-high Wins |
T1134.001 | A Windows host logs show PowerShell fetching and executing a remote script to gain SeDebugPrivilege token duplication. Which Empire module was invoked? | ✓ | ✗ | gpt-5-high Wins |
T1546.004 | On Linux, review file events for changes in the system-wide shell profile directory. Determine the name of the script file in /etc/profile.d that shows evidence of an unauthorized append. | ✓ | ✗ | gpt-5-high Wins |
T1547.002 | A Windows host shows a suspicious registry change under the LSA hive. Review recent registry events to locate any new entries under Authentication Packages and determine the name of the DLL the attacker added. | ✓ | ✗ | gpt-5-high Wins |
T1546.003 | On a Windows endpoint, an attacker ran a PowerShell sequence to establish a WMI event subscription using CommandLineEventConsumer. Inspect the process or script execution logs to uncover which executable was set to run by this subscription. | ✓ | ✗ | gpt-5-high Wins |
T1547.014 | A Windows endpoint shows an Active Setup entry under Internet Explorer Core Fonts being altered with a StubPath value. Investigate the registry events and identify the payload that was set. | ✓ | ✗ | gpt-5-high Wins |
T1552.001 | A Linux system shows a 'find' command used to search within .aws directories. Which specific AWS credential filename was the attacker attempting to locate? | ✓ | ✗ | gpt-5-high Wins |
T1548.001 | A Linux host’s Syslog contains records of an elevated shell executing a command that granted group execute rights and enabled the SetGID bit on a file. Investigate the logs and report the name of the file whose group ID bit was modified. | ✓ | ✗ | gpt-5-high Wins |
Page 1 of 8
Explore individual model performance and detailed analysis