gpt-4.1-mini vs gpt-5-mini-low KQL Benchmark

gpt-5-mini-low wins by 4.3%

Compared on 187 shared test questions

Overall Accuracy

gpt-4.1-mini

41.7%

78 / 187 correct

gpt-5-mini-low

46.0%

86 / 187 correct

Average Cost per Query

gpt-4.1-mini: $0.0057
gpt-5-mini-low: $0.0146
gpt-5-mini-low costs 155.2% more

Average Execution Time

gpt-4.1-mini: 14.13s
gpt-5-mini-low: 54.72s
gpt-5-mini-low takes 287.2% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 187 questions
Page 1 of 8
T1003
On a Windows host, an attacker leveraged COMSVCS.DLL via rundll32.exe to dump the memory of svchost.exe into a file under the Temp directory. Review the file creation logs to determine the exact name of the dump file that was generated.
gpt-4.1-mini Wins
T1027
A Windows host shows a process launch with an extremely obfuscated command line that dynamically builds and invokes code at runtime. Which process name was used to execute this payload?
gpt-4.1-mini Wins
T1059.004
Which full interactive shell command, as recorded in the Linux process logs, repeatedly echoed a distinctive marker message to the terminal?
gpt-4.1-mini Wins
T1059.004
An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell.
gpt-4.1-mini Wins
T1069.001
Investigate Windows process execution logs for a PowerShell cmdlet used to list group members. Look for entries where a group name is provided after a '-Name' flag and identify which group was queried.
gpt-4.1-mini Wins
T1070.003
On a Windows device, there’s evidence that PowerShell history was wiped by deleting the history file. What was the exact command used to perform this action?
gpt-4.1-mini Wins
T1078.003
Review the Linux process creation records to find which user account management utility was used to reactivate the previously locked and expired account.
gpt-4.1-mini Wins
T1082
A user‐space process on a Linux device invoked a shell to capture and display the system’s environment variables and path. Which exact command was used to perform this discovery?
gpt-4.1-mini Wins
T1082
A Windows system shows a cmd.exe process spawn that appears to have been used for environment discovery. Review the process creation records to identify the exact command the adversary ran to enumerate environment variables.
gpt-4.1-mini Wins
T1112
On a Windows host, registry events reveal that PowerShell modified a value under the WDigest provider. Identify the exact command line that performed this registry change.
gpt-4.1-mini Wins
T1197
A suspicious BITS transfer was orchestrated via bitsadmin.exe on Windows, creating a job to download and then execute a payload. Investigate the process event logs to determine what custom job name was specified when the BITS job was created.
gpt-4.1-mini Wins
T1217
On a Windows system, you notice a process that recursively enumerates files named 'Bookmarks' under every user profile directory. Which Windows command-line utility was used to perform that search?
gpt-4.1-mini Wins
T1218.011
You notice rundll32.exe being used with desk.cpl,InstallScreenSaver on a Windows endpoint. Investigate your process creation logs to find which .scr file was loaded by this unusual invocation.
gpt-4.1-mini Wins
T1222.002
On a Linux host, process execution logs show a chmod invocation with a recursive flag. Which file or folder was targeted by this recursive permission change?
gpt-4.1-mini Wins
T1542.001
Investigate Windows file creation logs to uncover any new executable added directly to the System32 directory, which may indicate a UEFI persistence implant. What was the name of the file created?
gpt-4.1-mini Wins
T1546.003
On a Windows endpoint, an attacker ran a PowerShell sequence to establish a WMI event subscription using CommandLineEventConsumer. Inspect the process or script execution logs to uncover which executable was set to run by this subscription.
gpt-4.1-mini Wins
T1548.002
On a Windows endpoint, someone may have disabled the secure desktop for elevation prompts by modifying a registry setting. Review the registry event logs to identify which registry value name was changed to 0.
gpt-4.1-mini Wins
T1548.001
A Linux system shows a shell invocation that appears to be searching for files with elevated group permissions. Using the available process execution logs, determine exactly what command was run.
gpt-4.1-mini Wins
T1552.003
A Linux user’s bash history was searched for patterns like ‘pass’ and ‘ssh’, and the matching lines were redirected into a new file. Determine the name of that file.
gpt-4.1-mini Wins
T1555
On a Windows host, an external PowerShell script is fetched and run to harvest local Wi-Fi credentials. Investigate the process execution logs to find out what script file name was downloaded and invoked.
gpt-4.1-mini Wins
T1555
On Windows, review PowerShell process events to spot where a remote .ps1 was fetched and run to pull vault credentials. Determine the name of the script file that was downloaded.
gpt-4.1-mini Wins
T1557.001
On Windows devices, hunt for PowerShell activity where a remote script is fetched and executed to perform LLMNR/NBNS spoofing. Which cmdlet kicked off the listener?
gpt-4.1-mini Wins
T1559
Investigating a Windows device, you suspect a non-standard executable was launched to set up a named pipe for client-server messaging. Determine the name of the executable that was run.
gpt-4.1-mini Wins
T1562.004
On a Windows device, a new inbound firewall rule was created unexpectedly. Review process execution records to identify the command-line utility responsible for adding the rule.
gpt-4.1-mini Wins
T1562
Review Linux process execution logs to find where the system journal service was stopped. Which utility was invoked to disable journal logging?
gpt-4.1-mini Wins
Page 1 of 8

Explore individual model performance and detailed analysis