gpt-4.1-mini vs o3-high KQL Benchmark

o3-high wins by 13.3%

Compared on 188 shared test questions

Overall Accuracy

gpt-4.1-mini

41.5%

78 / 188 correct

o3-high

54.8%

103 / 188 correct

Average Cost per Query

gpt-4.1-mini: $0.0057
o3-high: $0.0632
o3-high costs 1004.5% more

Average Execution Time

gpt-4.1-mini: 14.13s
o3-high: 78.68s
o3-high takes 456.7% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1016.001
An analyst notices a PowerShell process on a Windows host that appears to be checking SMB connectivity. Which PowerShell cmdlet was executed to perform this outbound port 445 test?
gpt-4.1-mini Wins
T1036.003
In a Linux environment, you observe a process labeled like the cron daemon but running from an unexpected path. Investigate creation events to uncover the actual filename used by this fake cron process.
gpt-4.1-mini Wins
T1036.004
A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked?
gpt-4.1-mini Wins
T1059.004
Which full interactive shell command, as recorded in the Linux process logs, repeatedly echoed a distinctive marker message to the terminal?
gpt-4.1-mini Wins
T1059.004
An attacker on a Linux host may try to enumerate installed shells by reading the system file that lists valid shells. Using process or syslog data, determine which command was executed to perform this enumeration.
gpt-4.1-mini Wins
T1059.004
An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell.
gpt-4.1-mini Wins
T1069.001
Review recent Windows process event logs for PowerShell activity that suggests local group enumeration through WMI. What exact command was executed?
gpt-4.1-mini Wins
T1069.001
Investigate Windows process execution logs for a PowerShell cmdlet used to list group members. Look for entries where a group name is provided after a '-Name' flag and identify which group was queried.
gpt-4.1-mini Wins
T1070.003
On a Windows device, there’s evidence that PowerShell history was wiped by deleting the history file. What was the exact command used to perform this action?
gpt-4.1-mini Wins
T1112
A Windows host logs a change to the Terminal Server registry key disabling single-session per user. Which command-line utility executed this registry modification?
gpt-4.1-mini Wins
T1112
On a Windows endpoint, review the registry write events to spot when the WDigest key is altered to permit plaintext credential storage. What registry value name was changed?
gpt-4.1-mini Wins
T1120
Review Windows process execution logs to find any native utility that was used to enumerate connected drives. Which utility was invoked?
gpt-4.1-mini Wins
T1201
You are reviewing Linux syslog records on a CentOS/RHEL 7.x server. You notice entries for shell commands that access system configuration files under /etc/security. Determine exactly which configuration file was being inspected by the command.
gpt-4.1-mini Wins
T1217
On Linux, review the process execution logs to uncover when Chromium’s bookmark JSON files were being located and the results persisted. Focus on shell commands that search under .config/chromium and write output to a file. What was the filename used to save the findings?
gpt-4.1-mini Wins
T1505.005
A suspicious registry change was made on a Windows system modifying the Terminal Services DLL path. Investigate registry events to find out which DLL file name was set as the ServiceDll value under TermService. What was the file name?
gpt-4.1-mini Wins
T1546.003
On a Windows endpoint, an attacker ran a PowerShell sequence to establish a WMI event subscription using CommandLineEventConsumer. Inspect the process or script execution logs to uncover which executable was set to run by this subscription.
gpt-4.1-mini Wins
T1552.003
A Linux user’s bash history was searched for patterns like ‘pass’ and ‘ssh’, and the matching lines were redirected into a new file. Determine the name of that file.
gpt-4.1-mini Wins
T1555
On a Windows host, an external PowerShell script is fetched and run to harvest local Wi-Fi credentials. Investigate the process execution logs to find out what script file name was downloaded and invoked.
gpt-4.1-mini Wins
T1555
A security investigator suspects that someone attempted to dump stored web credentials on a Windows system using an in-built command-line tool. Review process creation logs to determine which executable was called to list the Web Credentials vault.
gpt-4.1-mini Wins
T1560
Windows system logs show PowerShell zipping up the contents of a user’s profile folder. Investigate process and file events to determine the exact name of the ZIP archive created.
gpt-4.1-mini Wins
T1559
Investigating a Windows device, you suspect a non-standard executable was launched to set up a named pipe for client-server messaging. Determine the name of the executable that was run.
gpt-4.1-mini Wins
T1562.003
On a Linux system you suspect someone altered Bash’s history settings to hide their activity. Investigate process logs for evidence of HISTCONTROL being set to ignore entries. What was the full command executed to configure HISTCONTROL?
gpt-4.1-mini Wins
T1562.004
On a Windows device, a new inbound firewall rule was created unexpectedly. Review process execution records to identify the command-line utility responsible for adding the rule.
gpt-4.1-mini Wins
T1007
An analyst suspects a user or script ran a service enumeration command on a Linux system. Review process events to find the service-listing invocation and specify the full command that was executed.
o3-high Wins
T1006
Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path.
o3-high Wins
Page 1 of 8

Explore individual model performance and detailed analysis