gpt-4.1-mini vs gpt-5-mini-low KQL Benchmark
gpt-5-mini-low wins by 4.3%
Compared on 187 shared test questions
Overall Accuracy
gpt-4.1-mini
41.7%
78 / 187 correct
gpt-5-mini-low
46.0%
86 / 187 correct
Average Cost per Query
gpt-4.1-mini: $0.0057
gpt-5-mini-low: $0.0146
gpt-5-mini-low costs 155.2% more
Average Execution Time
gpt-4.1-mini: 14.13s
gpt-5-mini-low: 54.72s
gpt-5-mini-low takes 287.2% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 187 questions
Page 1 of 8
T1003 | On a Windows host, an attacker leveraged COMSVCS.DLL via rundll32.exe to dump the memory of svchost.exe into a file under the Temp directory. Review the file creation logs to determine the exact name of the dump file that was generated. | ✓ | ✗ | gpt-4.1-mini Wins |
T1027 | A Windows host shows a process launch with an extremely obfuscated command line that dynamically builds and invokes code at runtime. Which process name was used to execute this payload? | ✓ | ✗ | gpt-4.1-mini Wins |
T1059.004 | Which full interactive shell command, as recorded in the Linux process logs, repeatedly echoed a distinctive marker message to the terminal? | ✓ | ✗ | gpt-4.1-mini Wins |
T1059.004 | An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell. | ✓ | ✗ | gpt-4.1-mini Wins |
T1069.001 | Investigate Windows process execution logs for a PowerShell cmdlet used to list group members. Look for entries where a group name is provided after a '-Name' flag and identify which group was queried. | ✓ | ✗ | gpt-4.1-mini Wins |
T1070.003 | On a Windows device, there’s evidence that PowerShell history was wiped by deleting the history file. What was the exact command used to perform this action? | ✓ | ✗ | gpt-4.1-mini Wins |
T1078.003 | Review the Linux process creation records to find which user account management utility was used to reactivate the previously locked and expired account. | ✓ | ✗ | gpt-4.1-mini Wins |
T1082 | A user‐space process on a Linux device invoked a shell to capture and display the system’s environment variables and path. Which exact command was used to perform this discovery? | ✓ | ✗ | gpt-4.1-mini Wins |
T1082 | A Windows system shows a cmd.exe process spawn that appears to have been used for environment discovery. Review the process creation records to identify the exact command the adversary ran to enumerate environment variables. | ✓ | ✗ | gpt-4.1-mini Wins |
T1112 | On a Windows host, registry events reveal that PowerShell modified a value under the WDigest provider. Identify the exact command line that performed this registry change. | ✓ | ✗ | gpt-4.1-mini Wins |
T1197 | A suspicious BITS transfer was orchestrated via bitsadmin.exe on Windows, creating a job to download and then execute a payload. Investigate the process event logs to determine what custom job name was specified when the BITS job was created. | ✓ | ✗ | gpt-4.1-mini Wins |
T1217 | On a Windows system, you notice a process that recursively enumerates files named 'Bookmarks' under every user profile directory. Which Windows command-line utility was used to perform that search? | ✓ | ✗ | gpt-4.1-mini Wins |
T1218.011 | You notice rundll32.exe being used with desk.cpl,InstallScreenSaver on a Windows endpoint. Investigate your process creation logs to find which .scr file was loaded by this unusual invocation. | ✓ | ✗ | gpt-4.1-mini Wins |
T1222.002 | On a Linux host, process execution logs show a chmod invocation with a recursive flag. Which file or folder was targeted by this recursive permission change? | ✓ | ✗ | gpt-4.1-mini Wins |
T1542.001 | Investigate Windows file creation logs to uncover any new executable added directly to the System32 directory, which may indicate a UEFI persistence implant. What was the name of the file created? | ✓ | ✗ | gpt-4.1-mini Wins |
T1546.003 | On a Windows endpoint, an attacker ran a PowerShell sequence to establish a WMI event subscription using CommandLineEventConsumer. Inspect the process or script execution logs to uncover which executable was set to run by this subscription. | ✓ | ✗ | gpt-4.1-mini Wins |
T1548.002 | On a Windows endpoint, someone may have disabled the secure desktop for elevation prompts by modifying a registry setting. Review the registry event logs to identify which registry value name was changed to 0. | ✓ | ✗ | gpt-4.1-mini Wins |
T1548.001 | A Linux system shows a shell invocation that appears to be searching for files with elevated group permissions. Using the available process execution logs, determine exactly what command was run. | ✓ | ✗ | gpt-4.1-mini Wins |
T1552.003 | A Linux user’s bash history was searched for patterns like ‘pass’ and ‘ssh’, and the matching lines were redirected into a new file. Determine the name of that file. | ✓ | ✗ | gpt-4.1-mini Wins |
T1555 | On a Windows host, an external PowerShell script is fetched and run to harvest local Wi-Fi credentials. Investigate the process execution logs to find out what script file name was downloaded and invoked. | ✓ | ✗ | gpt-4.1-mini Wins |
T1555 | On Windows, review PowerShell process events to spot where a remote .ps1 was fetched and run to pull vault credentials. Determine the name of the script file that was downloaded. | ✓ | ✗ | gpt-4.1-mini Wins |
T1557.001 | On Windows devices, hunt for PowerShell activity where a remote script is fetched and executed to perform LLMNR/NBNS spoofing. Which cmdlet kicked off the listener? | ✓ | ✗ | gpt-4.1-mini Wins |
T1559 | Investigating a Windows device, you suspect a non-standard executable was launched to set up a named pipe for client-server messaging. Determine the name of the executable that was run. | ✓ | ✗ | gpt-4.1-mini Wins |
T1562.004 | On a Windows device, a new inbound firewall rule was created unexpectedly. Review process execution records to identify the command-line utility responsible for adding the rule. | ✓ | ✗ | gpt-4.1-mini Wins |
T1562 | Review Linux process execution logs to find where the system journal service was stopped. Which utility was invoked to disable journal logging? | ✓ | ✗ | gpt-4.1-mini Wins |
Page 1 of 8
Explore individual model performance and detailed analysis