gpt-5-high vs o3-high KQL Benchmark

gpt-5-high wins by 8.5%

Compared on 188 shared test questions

Overall Accuracy

gpt-5-high

63.3%

119 / 188 correct

o3-high

54.8%

103 / 188 correct

Average Cost per Query

gpt-5-high: $0.1529
o3-high: $0.0632
gpt-5-high costs 142.0% more

Average Execution Time

gpt-5-high: 192.47s
o3-high: 78.68s
gpt-5-high takes 144.6% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1016.001
On a Linux host, a ping command was executed to test internet connectivity. Determine which IP address was used as the ping target.
gpt-5-high Wins
T1016.001
An analyst notices a PowerShell process on a Windows host that appears to be checking SMB connectivity. Which PowerShell cmdlet was executed to perform this outbound port 445 test?
gpt-5-high Wins
T1053.006
Examine the logs from the Linux system for events related to the systemd timer activation. Identify any records indicating that a new timer unit was started and enabled, and determine which timer name was used.
gpt-5-high Wins
T1036.003
In a Linux environment, you observe a process labeled like the cron daemon but running from an unexpected path. Investigate creation events to uncover the actual filename used by this fake cron process.
gpt-5-high Wins
T1069.001
Review recent Windows process event logs for PowerShell activity that suggests local group enumeration through WMI. What exact command was executed?
gpt-5-high Wins
T1059.004
An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell.
gpt-5-high Wins
T1069.001
Investigate Windows process execution logs for a PowerShell cmdlet used to list group members. Look for entries where a group name is provided after a '-Name' flag and identify which group was queried.
gpt-5-high Wins
T1059.004
On a Linux system, analyze the process logs for suspicious command line activity that includes a sequence of commands indicating a pipe-to-shell operation. Identify the tool that was used to execute this piped command, paying special attention to its use in downloading and running script content.
gpt-5-high Wins
T1070.003
On a Windows device, there’s evidence that PowerShell history was wiped by deleting the history file. What was the exact command used to perform this action?
gpt-5-high Wins
T1082
On Windows systems, identify when the built-in Shadow Copy utility is used to enumerate existing snapshots. What was the full command executed?
gpt-5-high Wins
T1082
Review Windows process logs to find which built-in command was executed to reveal the system’s hostname.
gpt-5-high Wins
T1070.008
An attacker on Linux used bash to copy all files from /var/spool/mail into a newly created subdirectory before modifying them. What is the name of that subdirectory?
gpt-5-high Wins
T1112
Evidence shows that the Windows Defender startup entry was tampered with via an elevated command prompt. Investigate registry events related to the Run key to discover which executable replaced the default SecurityHealth value. What is the name of the new program?
gpt-5-high Wins
T1082
Windows: Investigate PowerShell process events for instances where a web client fetched and executed an external host-survey tool. What was the name of the script file that was downloaded and run?
gpt-5-high Wins
T1112
On a Windows device, examine registry event logs for modifications under the System policies path. Determine which registry value name was altered to disable the shutdown button at login.
gpt-5-high Wins
T1112
On a Windows endpoint, review the registry write events to spot when the WDigest key is altered to permit plaintext credential storage. What registry value name was changed?
gpt-5-high Wins
T1112
A Windows host logs a change to the Terminal Server registry key disabling single-session per user. Which command-line utility executed this registry modification?
gpt-5-high Wins
T1112
Investigate Windows registry events to identify any newly set ProxyServer entry under the user Internet Settings hive. What proxy server address was configured?
gpt-5-high Wins
T1134.001
A Windows host logs show PowerShell fetching and executing a remote script to gain SeDebugPrivilege token duplication. Which Empire module was invoked?
gpt-5-high Wins
T1546.004
On Linux, review file events for changes in the system-wide shell profile directory. Determine the name of the script file in /etc/profile.d that shows evidence of an unauthorized append.
gpt-5-high Wins
T1547.002
A Windows host shows a suspicious registry change under the LSA hive. Review recent registry events to locate any new entries under Authentication Packages and determine the name of the DLL the attacker added.
gpt-5-high Wins
T1546.003
On a Windows endpoint, an attacker ran a PowerShell sequence to establish a WMI event subscription using CommandLineEventConsumer. Inspect the process or script execution logs to uncover which executable was set to run by this subscription.
gpt-5-high Wins
T1547.014
A Windows endpoint shows an Active Setup entry under Internet Explorer Core Fonts being altered with a StubPath value. Investigate the registry events and identify the payload that was set.
gpt-5-high Wins
T1552.001
A Linux system shows a 'find' command used to search within .aws directories. Which specific AWS credential filename was the attacker attempting to locate?
gpt-5-high Wins
T1548.001
A Linux host’s Syslog contains records of an elevated shell executing a command that granted group execute rights and enabled the SetGID bit on a file. Investigate the logs and report the name of the file whose group ID bit was modified.
gpt-5-high Wins
Page 1 of 8

Explore individual model performance and detailed analysis