o1-high vs o3-high KQL Benchmark

o1-high wins by 8.5%

Compared on 188 shared test questions

Overall Accuracy

o1-high

63.3%

119 / 188 correct

o3-high

54.8%

103 / 188 correct

Average Cost per Query

o1-high: $0.5239
o3-high: $0.0632
o1-high costs 729.4% more

Average Execution Time

o1-high: 57.03s
o3-high: 78.68s
o3-high takes 38.0% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1016.001
On a Linux host, a ping command was executed to test internet connectivity. Determine which IP address was used as the ping target.
o1-high Wins
T1016.001
An analyst notices a PowerShell process on a Windows host that appears to be checking SMB connectivity. Which PowerShell cmdlet was executed to perform this outbound port 445 test?
o1-high Wins
T1018
Review Linux process execution records for any commands that list TCP metric cache entries and filter out loopback interfaces. Which utility was used?
o1-high Wins
T1036.004
A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked?
o1-high Wins
T1036.003
In a Linux environment, you observe a process labeled like the cron daemon but running from an unexpected path. Investigate creation events to uncover the actual filename used by this fake cron process.
o1-high Wins
T1053.005
On a Windows host, find any scheduled task that was registered using PowerShell native cmdlets instead of schtasks.exe. What was the name given to the new task?
o1-high Wins
T1059.004
An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell.
o1-high Wins
T1057
While reviewing Windows process events, you spot a PowerShell process executing a WMI enumeration cmdlet. What WMI class name did the attacker query?
o1-high Wins
T1057
A malicious actor may attempt to list running processes on a Windows machine using a WMI-based command. Review the process creation events to find out which utility was invoked to perform this enumeration.
o1-high Wins
T1059.004
An attacker on a Linux host may try to enumerate installed shells by reading the system file that lists valid shells. Using process or syslog data, determine which command was executed to perform this enumeration.
o1-high Wins
T1069.001
Review recent Windows process event logs for PowerShell activity that suggests local group enumeration through WMI. What exact command was executed?
o1-high Wins
T1070.004
On a Linux device, a file was silently removed from the /tmp/victim-files directory. Search through file event or syslog records to identify the exact file name that was deleted.
o1-high Wins
T1070.006
On a Linux system, attackers may use timestamp manipulation to hide malicious changes. Investigate relevant logs to identify which file’s modification timestamp was altered by such a command.
o1-high Wins
T1082
Windows: Investigate PowerShell process events for instances where a web client fetched and executed an external host-survey tool. What was the name of the script file that was downloaded and run?
o1-high Wins
T1082
Review Windows process logs to find which built-in command was executed to reveal the system’s hostname.
o1-high Wins
T1112
A Windows host logs a change to the Terminal Server registry key disabling single-session per user. Which command-line utility executed this registry modification?
o1-high Wins
T1112
Evidence shows that the Windows Defender startup entry was tampered with via an elevated command prompt. Investigate registry events related to the Run key to discover which executable replaced the default SecurityHealth value. What is the name of the new program?
o1-high Wins
T1112
On a Windows endpoint, review the registry write events to spot when the WDigest key is altered to permit plaintext credential storage. What registry value name was changed?
o1-high Wins
T1112
Investigate Windows registry events to identify any newly set ProxyServer entry under the user Internet Settings hive. What proxy server address was configured?
o1-high Wins
T1201
You are reviewing Linux syslog records on a CentOS/RHEL 7.x server. You notice entries for shell commands that access system configuration files under /etc/security. Determine exactly which configuration file was being inspected by the command.
o1-high Wins
T1201
On a Linux system, logs show that the password expiration settings file was accessed. Identify which command was executed to list its contents.
o1-high Wins
T1218.010
An attacker has attempted to sideload code by invoking regsvr32.exe in a Windows host against a file that does not use the standard .dll extension. Investigate the process event logs to determine the name of the file that was registered.
o1-high Wins
T1505.005
A suspicious registry change was made on a Windows system modifying the Terminal Services DLL path. Investigate registry events to find out which DLL file name was set as the ServiceDll value under TermService. What was the file name?
o1-high Wins
T1546.003
On a Windows endpoint, an attacker ran a PowerShell sequence to establish a WMI event subscription using CommandLineEventConsumer. Inspect the process or script execution logs to uncover which executable was set to run by this subscription.
o1-high Wins
T1531
Within Windows process event logs, identify instances where the built-in net.exe utility is used to change a user account password. What was the new password argument passed in?
o1-high Wins
Page 1 of 8

Explore individual model performance and detailed analysis