gpt-4.1-mini vs o3-mini-low KQL Benchmark

o3-mini-low wins by 10.1%

Compared on 188 shared test questions

Overall Accuracy

gpt-4.1-mini

41.5%

78 / 188 correct

o3-mini-low

51.6%

97 / 188 correct

Average Cost per Query

gpt-4.1-mini: $0.0057
o3-mini-low: $0.0279
o3-mini-low costs 387.4% more

Average Execution Time

gpt-4.1-mini: 14.13s
o3-mini-low: 36.54s
o3-mini-low takes 158.6% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1003
On a Windows host, an attacker leveraged COMSVCS.DLL via rundll32.exe to dump the memory of svchost.exe into a file under the Temp directory. Review the file creation logs to determine the exact name of the dump file that was generated.
gpt-4.1-mini Wins
T1016.001
An analyst notices a PowerShell process on a Windows host that appears to be checking SMB connectivity. Which PowerShell cmdlet was executed to perform this outbound port 445 test?
gpt-4.1-mini Wins
T1036.004
A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked?
gpt-4.1-mini Wins
T1039
On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action?
gpt-4.1-mini Wins
T1059.004
Which full interactive shell command, as recorded in the Linux process logs, repeatedly echoed a distinctive marker message to the terminal?
gpt-4.1-mini Wins
T1070.003
On a Windows device, there’s evidence that PowerShell history was wiped by deleting the history file. What was the exact command used to perform this action?
gpt-4.1-mini Wins
T1070.006
On a Windows host, suspicious PowerShell activity adjusted the system clock and recorded a value. What numeric value was used to slip the system date?
gpt-4.1-mini Wins
T1082
Using Linux process execution logs, identify the specific command that was used to filter loaded kernel modules for entries containing “vmw.” What was that full command?
gpt-4.1-mini Wins
T1120
Review Windows process execution logs to find any native utility that was used to enumerate connected drives. Which utility was invoked?
gpt-4.1-mini Wins
T1197
A suspicious BITS transfer was orchestrated via bitsadmin.exe on Windows, creating a job to download and then execute a payload. Investigate the process event logs to determine what custom job name was specified when the BITS job was created.
gpt-4.1-mini Wins
T1201
You are reviewing Linux syslog records on a CentOS/RHEL 7.x server. You notice entries for shell commands that access system configuration files under /etc/security. Determine exactly which configuration file was being inspected by the command.
gpt-4.1-mini Wins
T1217
An attacker is suspected of using the Windows shell to enumerate a user’s Internet Explorer bookmarks via the Favorites folder. Identify the exact command they executed to perform this listing.
gpt-4.1-mini Wins
T1218.011
You notice rundll32.exe being used with desk.cpl,InstallScreenSaver on a Windows endpoint. Investigate your process creation logs to find which .scr file was loaded by this unusual invocation.
gpt-4.1-mini Wins
T1542.001
Investigate Windows file creation logs to uncover any new executable added directly to the System32 directory, which may indicate a UEFI persistence implant. What was the name of the file created?
gpt-4.1-mini Wins
T1546.003
On a Windows endpoint, an attacker ran a PowerShell sequence to establish a WMI event subscription using CommandLineEventConsumer. Inspect the process or script execution logs to uncover which executable was set to run by this subscription.
gpt-4.1-mini Wins
T1548.001
A Linux system shows a shell invocation that appears to be searching for files with elevated group permissions. Using the available process execution logs, determine exactly what command was run.
gpt-4.1-mini Wins
T1555
A security investigator suspects that someone attempted to dump stored web credentials on a Windows system using an in-built command-line tool. Review process creation logs to determine which executable was called to list the Web Credentials vault.
gpt-4.1-mini Wins
T1560
Windows system logs show PowerShell zipping up the contents of a user’s profile folder. Investigate process and file events to determine the exact name of the ZIP archive created.
gpt-4.1-mini Wins
T1559
Investigating a Windows device, you suspect a non-standard executable was launched to set up a named pipe for client-server messaging. Determine the name of the executable that was run.
gpt-4.1-mini Wins
T1564.002
On Windows systems, identify any user account that was hidden by setting its value to 0 under the SpecialAccounts\\UserList registry key. What was the name of the hidden account?
gpt-4.1-mini Wins
T1614.001
On a Windows device, an attacker ran a PowerShell script to collect system settings including UI language and locale. Identify which cmdlet in the command line was used to obtain the system locale.
gpt-4.1-mini Wins
T1614.001
Using Linux process or syslog logs, identify the executable that was run to output the system's locale information.
gpt-4.1-mini Wins
T1003.008
In a Linux environment, an elevated process was used to execute a command that read /etc/shadow and redirected its output to a file. Identify what file name was employed to store these results.
o3-mini-low Wins
T1016.001
On a Linux host, a ping command was executed to test internet connectivity. Determine which IP address was used as the ping target.
o3-mini-low Wins
T1016
A Linux host’s Syslog shows a shell-based network discovery script ran multiple commands. One of them listed current TCP connections. Which utility was invoked?
o3-mini-low Wins
Page 1 of 8

Explore individual model performance and detailed analysis