o1-low vs o3-high KQL Benchmark
o1-low wins by 8.5%
Compared on 188 shared test questions
Overall Accuracy
o1-low
63.3%
119 / 188 correct
o3-high
54.8%
103 / 188 correct
Average Cost per Query
o1-low: $0.4994
o3-high: $0.0632
o1-low costs 690.5% more
Average Execution Time
o1-low: 50.90s
o3-high: 78.68s
o3-high takes 54.6% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 188 questions
Page 1 of 8
T1016.001 | An analyst notices a PowerShell process on a Windows host that appears to be checking SMB connectivity. Which PowerShell cmdlet was executed to perform this outbound port 445 test? | ✓ | ✗ | o1-low Wins |
T1018 | Review Linux process execution records for any commands that list TCP metric cache entries and filter out loopback interfaces. Which utility was used? | ✓ | ✗ | o1-low Wins |
T1016.001 | On a Linux host, a ping command was executed to test internet connectivity. Determine which IP address was used as the ping target. | ✓ | ✗ | o1-low Wins |
T1036.004 | A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked? | ✓ | ✗ | o1-low Wins |
T1057 | A malicious actor may attempt to list running processes on a Windows machine using a WMI-based command. Review the process creation events to find out which utility was invoked to perform this enumeration. | ✓ | ✗ | o1-low Wins |
T1053.006 | Examine the logs from the Linux system for events related to the systemd timer activation. Identify any records indicating that a new timer unit was started and enabled, and determine which timer name was used. | ✓ | ✗ | o1-low Wins |
T1059.004 | An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell. | ✓ | ✗ | o1-low Wins |
T1059.004 | An attacker on a Linux host may try to enumerate installed shells by reading the system file that lists valid shells. Using process or syslog data, determine which command was executed to perform this enumeration. | ✓ | ✗ | o1-low Wins |
T1059.004 | During a Linux investigation, you notice processes spawning curl and wget commands that pull a script from a remote GitHub raw URL and pipe it into bash. Identify the name of the script that was retrieved and executed. | ✓ | ✗ | o1-low Wins |
T1069.001 | Review recent Windows process event logs for PowerShell activity that suggests local group enumeration through WMI. What exact command was executed? | ✓ | ✗ | o1-low Wins |
T1070.004 | On a Linux device, a file was silently removed from the /tmp/victim-files directory. Search through file event or syslog records to identify the exact file name that was deleted. | ✓ | ✗ | o1-low Wins |
T1070.006 | On a Linux system, attackers may use timestamp manipulation to hide malicious changes. Investigate relevant logs to identify which file’s modification timestamp was altered by such a command. | ✓ | ✗ | o1-low Wins |
T1070.003 | On a Windows device, there’s evidence that PowerShell history was wiped by deleting the history file. What was the exact command used to perform this action? | ✓ | ✗ | o1-low Wins |
T1112 | Evidence shows that the Windows Defender startup entry was tampered with via an elevated command prompt. Investigate registry events related to the Run key to discover which executable replaced the default SecurityHealth value. What is the name of the new program? | ✓ | ✗ | o1-low Wins |
T1082 | Review Windows process logs to find which built-in command was executed to reveal the system’s hostname. | ✓ | ✗ | o1-low Wins |
T1112 | A Windows host logs a change to the Terminal Server registry key disabling single-session per user. Which command-line utility executed this registry modification? | ✓ | ✗ | o1-low Wins |
T1112 | Review registry event logs on the Windows host for PowerShell-driven writes to system policy and file system keys. Which registry value names were created during this BlackByte preparation simulation? | ✓ | ✗ | o1-low Wins |
T1112 | On a Windows endpoint, review the registry write events to spot when the WDigest key is altered to permit plaintext credential storage. What registry value name was changed? | ✓ | ✗ | o1-low Wins |
T1112 | On a Windows device, examine registry event logs for modifications under the System policies path. Determine which registry value name was altered to disable the shutdown button at login. | ✓ | ✗ | o1-low Wins |
T1112 | Investigate Windows registry events to identify any newly set ProxyServer entry under the user Internet Settings hive. What proxy server address was configured? | ✓ | ✗ | o1-low Wins |
T1134.001 | A Windows host logs show PowerShell fetching and executing a remote script to gain SeDebugPrivilege token duplication. Which Empire module was invoked? | ✓ | ✗ | o1-low Wins |
T1201 | You are reviewing Linux syslog records on a CentOS/RHEL 7.x server. You notice entries for shell commands that access system configuration files under /etc/security. Determine exactly which configuration file was being inspected by the command. | ✓ | ✗ | o1-low Wins |
T1218.010 | An attacker has attempted to sideload code by invoking regsvr32.exe in a Windows host against a file that does not use the standard .dll extension. Investigate the process event logs to determine the name of the file that was registered. | ✓ | ✗ | o1-low Wins |
T1201 | On a Linux system, logs show that the password expiration settings file was accessed. Identify which command was executed to list its contents. | ✓ | ✗ | o1-low Wins |
T1505.005 | A suspicious registry change was made on a Windows system modifying the Terminal Services DLL path. Investigate registry events to find out which DLL file name was set as the ServiceDll value under TermService. What was the file name? | ✓ | ✗ | o1-low Wins |
Page 1 of 8
Explore individual model performance and detailed analysis