gpt-4.1 vs o1-low KQL Benchmark

o1-low wins by 1.6%

Compared on 188 shared test questions

Overall Accuracy

gpt-4.1

61.7%

116 / 188 correct

o1-low

63.3%

119 / 188 correct

Average Cost per Query

gpt-4.1: $0.0285
o1-low: $0.4994
o1-low costs 1652.7% more

Average Execution Time

gpt-4.1: 9.95s
o1-low: 50.90s
o1-low takes 411.8% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1027
On a Linux system, identify the script that was generated by decoding a base64 data file and then executed. What was the filename of that script?
gpt-4.1 Wins
T1039
On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action?
gpt-4.1 Wins
T1053.005
Investigate Windows process events for PowerShell activity that leverages WMI to register a scheduled task via XML import. What was the name of the XML file supplied to the RegisterByXml method?
gpt-4.1 Wins
T1057
While reviewing Windows process events, you spot a PowerShell process executing a WMI enumeration cmdlet. What WMI class name did the attacker query?
gpt-4.1 Wins
T1053.005
On Windows, review recent registry changes to detect when the MSC file association was hijacked by a reg add operation. What executable file was configured as the default command under HKCU\Software\Classes\mscfile\shell\open\command?
gpt-4.1 Wins
T1057
On a Windows host, investigate process events to find when Task Manager was launched via cmd with an unusual flag. What was the full command executed?
gpt-4.1 Wins
T1070.003
On a Windows endpoint, review process execution logs to see if any PowerShell sessions were wiped clean. Which command was executed to clear the PowerShell history?
gpt-4.1 Wins
T1070.003
On a Linux endpoint, you suspect malicious clearing of the bash history by redirecting from the null device. Explore process or file events to uncover the exact shell command that performed this action.
gpt-4.1 Wins
T1069.001
Investigate Windows process execution logs for a PowerShell cmdlet used to list group members. Look for entries where a group name is provided after a '-Name' flag and identify which group was queried.
gpt-4.1 Wins
T1082
On Windows systems, identify when the built-in Shadow Copy utility is used to enumerate existing snapshots. What was the full command executed?
gpt-4.1 Wins
T1112
A Windows user’s registry was altered via a command-line tool to disable the lock workstation feature by adding a DWORD entry under the current user Policies\System key. Which registry value name was modified in this operation?
gpt-4.1 Wins
T1124
In Windows process event logs, you notice both the net time and w32tm commands being executed to display the system time and timezone. Which executor name from the test configuration was responsible for launching these utilities?
gpt-4.1 Wins
T1082
A Windows system shows a cmd.exe process spawn that appears to have been used for environment discovery. Review the process creation records to identify the exact command the adversary ran to enumerate environment variables.
gpt-4.1 Wins
T1120
Review Windows process execution logs to find any native utility that was used to enumerate connected drives. Which utility was invoked?
gpt-4.1 Wins
T1197
A suspicious BITS transfer was orchestrated via bitsadmin.exe on Windows, creating a job to download and then execute a payload. Investigate the process event logs to determine what custom job name was specified when the BITS job was created.
gpt-4.1 Wins
T1217
On a Windows system, you notice a process that recursively enumerates files named 'Bookmarks' under every user profile directory. Which Windows command-line utility was used to perform that search?
gpt-4.1 Wins
T1542.001
Investigate Windows file creation logs to uncover any new executable added directly to the System32 directory, which may indicate a UEFI persistence implant. What was the name of the file created?
gpt-4.1 Wins
T1546.004
A suspicious file modification on a Linux device targeted the ~/.bash_profile file, apparently adding a new line. What was the full command string that was appended?
gpt-4.1 Wins
T1547.002
A Windows host shows a suspicious registry change under the LSA hive. Review recent registry events to locate any new entries under Authentication Packages and determine the name of the DLL the attacker added.
gpt-4.1 Wins
T1547
A Windows host shows a process launching with install-driver switches, likely signaling malicious driver deployment. What is the name of the tool that was executed?
gpt-4.1 Wins
T1548.001
A Linux host’s Syslog contains records of an elevated shell executing a command that granted group execute rights and enabled the SetGID bit on a file. Investigate the logs and report the name of the file whose group ID bit was modified.
gpt-4.1 Wins
T1555
A security investigator suspects that someone attempted to dump stored web credentials on a Windows system using an in-built command-line tool. Review process creation logs to determine which executable was called to list the Web Credentials vault.
gpt-4.1 Wins
T1557.001
On Windows devices, hunt for PowerShell activity where a remote script is fetched and executed to perform LLMNR/NBNS spoofing. Which cmdlet kicked off the listener?
gpt-4.1 Wins
T1559
Investigating a Windows device, you suspect a non-standard executable was launched to set up a named pipe for client-server messaging. Determine the name of the executable that was run.
gpt-4.1 Wins
T1562.004
On a Windows device, a new inbound firewall rule was created unexpectedly. Review process execution records to identify the command-line utility responsible for adding the rule.
gpt-4.1 Wins
Page 1 of 8

Explore individual model performance and detailed analysis