gpt-4.1-mini vs o3-mini-high KQL Benchmark

o3-mini-high wins by 10.1%

Compared on 188 shared test questions

Overall Accuracy

gpt-4.1-mini

41.5%

78 / 188 correct

o3-mini-high

51.6%

97 / 188 correct

Average Cost per Query

gpt-4.1-mini: $0.0057
o3-mini-high: $0.0262
o3-mini-high costs 357.8% more

Average Execution Time

gpt-4.1-mini: 14.13s
o3-mini-high: 33.17s
o3-mini-high takes 134.7% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1003
On a Windows host, an attacker leveraged COMSVCS.DLL via rundll32.exe to dump the memory of svchost.exe into a file under the Temp directory. Review the file creation logs to determine the exact name of the dump file that was generated.
gpt-4.1-mini Wins
T1016.001
An analyst notices a PowerShell process on a Windows host that appears to be checking SMB connectivity. Which PowerShell cmdlet was executed to perform this outbound port 445 test?
gpt-4.1-mini Wins
T1027
On a Windows endpoint, look for evidence of a base64-encoded PowerShell payload execution. Which executable launched the encoded command?
gpt-4.1-mini Wins
T1036.004
A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked?
gpt-4.1-mini Wins
T1048.003
Windows process creation logs show a PowerShell-driven file transfer to an FTP service. Which executable was leveraged to perform this exfiltration?
gpt-4.1-mini Wins
T1039
On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action?
gpt-4.1-mini Wins
T1057
On a Windows device, PowerShell was used to collect a snapshot of running processes. Identify the exact cmdlet that was executed.
gpt-4.1-mini Wins
T1059.004
Which full interactive shell command, as recorded in the Linux process logs, repeatedly echoed a distinctive marker message to the terminal?
gpt-4.1-mini Wins
T1059.004
An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell.
gpt-4.1-mini Wins
T1069.001
Review recent Windows process event logs for PowerShell activity that suggests local group enumeration through WMI. What exact command was executed?
gpt-4.1-mini Wins
T1070.003
On a Windows device, there’s evidence that PowerShell history was wiped by deleting the history file. What was the exact command used to perform this action?
gpt-4.1-mini Wins
T1070.006
On a Windows host, suspicious PowerShell activity adjusted the system clock and recorded a value. What numeric value was used to slip the system date?
gpt-4.1-mini Wins
T1082
A user‐space process on a Linux device invoked a shell to capture and display the system’s environment variables and path. Which exact command was used to perform this discovery?
gpt-4.1-mini Wins
T1120
Review Windows process execution logs to find any native utility that was used to enumerate connected drives. Which utility was invoked?
gpt-4.1-mini Wins
T1217
An attacker is suspected of using the Windows shell to enumerate a user’s Internet Explorer bookmarks via the Favorites folder. Identify the exact command they executed to perform this listing.
gpt-4.1-mini Wins
T1217
On Linux, review the process execution logs to uncover when Chromium’s bookmark JSON files were being located and the results persisted. Focus on shell commands that search under .config/chromium and write output to a file. What was the filename used to save the findings?
gpt-4.1-mini Wins
T1218.011
You notice rundll32.exe being used with desk.cpl,InstallScreenSaver on a Windows endpoint. Investigate your process creation logs to find which .scr file was loaded by this unusual invocation.
gpt-4.1-mini Wins
T1557.001
On Windows devices, hunt for PowerShell activity where a remote script is fetched and executed to perform LLMNR/NBNS spoofing. Which cmdlet kicked off the listener?
gpt-4.1-mini Wins
T1559
Investigating a Windows device, you suspect a non-standard executable was launched to set up a named pipe for client-server messaging. Determine the name of the executable that was run.
gpt-4.1-mini Wins
T1614.001
On a Windows device, an attacker ran a PowerShell script to collect system settings including UI language and locale. Identify which cmdlet in the command line was used to obtain the system locale.
gpt-4.1-mini Wins
T1622
On the Windows device, a security check was run to detect debugger processes via PowerShell. Which tool (process) carried out this check?
gpt-4.1-mini Wins
T1006
Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path.
o3-mini-high Wins
T1003.008
In a Linux environment, an elevated process was used to execute a command that read /etc/shadow and redirected its output to a file. Identify what file name was employed to store these results.
o3-mini-high Wins
T1016.001
On a Linux host, a ping command was executed to test internet connectivity. Determine which IP address was used as the ping target.
o3-mini-high Wins
T1016
A Linux host’s Syslog shows a shell-based network discovery script ran multiple commands. One of them listed current TCP connections. Which utility was invoked?
o3-mini-high Wins
Page 1 of 8

Explore individual model performance and detailed analysis