gpt-4.1-nano vs gpt-5-mini-high KQL Benchmark
gpt-5-mini-high wins by 23.9%
Compared on 188 shared test questions
Overall Accuracy
gpt-4.1-nano
24.5%
46 / 188 correct
gpt-5-mini-high
48.4%
91 / 188 correct
Average Cost per Query
gpt-4.1-nano: $0.0014
gpt-5-mini-high: $0.0150
gpt-5-mini-high costs 962.7% more
Average Execution Time
gpt-4.1-nano: 10.78s
gpt-5-mini-high: 44.83s
gpt-5-mini-high takes 315.9% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 188 questions
Page 1 of 8
T1003.007 | On a Linux system, review process creation logs to spot any dd commands reading from /proc/*/mem. What was the name of the file where the dumped memory was written? | ✓ | ✗ | gpt-4.1-nano Wins |
T1039 | On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action? | ✓ | ✗ | gpt-4.1-nano Wins |
T1048.003 | Windows process creation logs show a PowerShell-driven file transfer to an FTP service. Which executable was leveraged to perform this exfiltration? | ✓ | ✗ | gpt-4.1-nano Wins |
T1048.003 | A Linux host briefly hosted an HTTP service under /tmp. Examine process creation logs to determine the exact python3 command that was used to start the server on port 9090. | ✓ | ✗ | gpt-4.1-nano Wins |
T1049 | A user launched a Windows command prompt and executed a built-in utility to enumerate all active network connections. Using process creation logs, identify the exact tool that produced the list of current connections. | ✓ | ✗ | gpt-4.1-nano Wins |
T1059.004 | On a Linux system, review process execution records for any shell process that set an environment variable containing executable code and then piped it into another shell instance. Determine which environment variable name was used to store the script content. | ✓ | ✗ | gpt-4.1-nano Wins |
T1078.003 | On a Linux host, review account management activity in Syslog or process event logs to pinpoint which command was executed to create a new local user. What was the name of the tool invoked? | ✓ | ✗ | gpt-4.1-nano Wins |
T1112 | Evidence shows that the Windows Defender startup entry was tampered with via an elevated command prompt. Investigate registry events related to the Run key to discover which executable replaced the default SecurityHealth value. What is the name of the new program? | ✓ | ✗ | gpt-4.1-nano Wins |
T1217 | An attacker is suspected of using the Windows shell to enumerate a user’s Internet Explorer bookmarks via the Favorites folder. Identify the exact command they executed to perform this listing. | ✓ | ✗ | gpt-4.1-nano Wins |
T1542.001 | Investigate Windows file creation logs to uncover any new executable added directly to the System32 directory, which may indicate a UEFI persistence implant. What was the name of the file created? | ✓ | ✗ | gpt-4.1-nano Wins |
T1562.003 | On a Linux system you suspect someone altered Bash’s history settings to hide their activity. Investigate process logs for evidence of HISTCONTROL being set to ignore entries. What was the full command executed to configure HISTCONTROL? | ✓ | ✗ | gpt-4.1-nano Wins |
T1562.003 | Within Linux process execution records, locate any bash commands where the HISTFILESIZE environment variable was exported. What value was assigned to HISTFILESIZE? | ✓ | ✗ | gpt-4.1-nano Wins |
T1007 | An analyst suspects a user or script ran a service enumeration command on a Linux system. Review process events to find the service-listing invocation and specify the full command that was executed. | ✗ | ✓ | gpt-5-mini-high Wins |
T1003 | On a Windows host, an attacker leveraged COMSVCS.DLL via rundll32.exe to dump the memory of svchost.exe into a file under the Temp directory. Review the file creation logs to determine the exact name of the dump file that was generated. | ✗ | ✓ | gpt-5-mini-high Wins |
T1006 | Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path. | ✗ | ✓ | gpt-5-mini-high Wins |
T1016 | A Linux host’s Syslog shows a shell-based network discovery script ran multiple commands. One of them listed current TCP connections. Which utility was invoked? | ✗ | ✓ | gpt-5-mini-high Wins |
T1016.002 | On a Windows host, someone appears to have run a built-in network shell utility to list saved wireless network profiles and their passwords in clear text. Review the process creation logs to discover the exact command that was executed. | ✗ | ✓ | gpt-5-mini-high Wins |
T1018 | Review Linux process execution records for any commands that list TCP metric cache entries and filter out loopback interfaces. Which utility was used? | ✗ | ✓ | gpt-5-mini-high Wins |
T1021.006 | On Windows hosts, look through recent PowerShell execution records to find any elevated session where remote management was turned on. What exact command was run to enable PSRemoting? | ✗ | ✓ | gpt-5-mini-high Wins |
T1036.004 | Analyze Windows process events for any schtasks.exe commands that created a new task invoking PowerShell. What is the name of the .ps1 script specified to run? | ✗ | ✓ | gpt-5-mini-high Wins |
T1053.003 | Linux hosts may log events when new files are added to /var/spool/cron/crontabs. Query those logs for a creation or write action in that directory and determine the file name that was added. | ✗ | ✓ | gpt-5-mini-high Wins |
T1053.005 | You suspect malicious persistence via scheduled tasks on a Windows endpoint. Review the process execution logs to identify the built-in utility used to register tasks at logon or startup. What is the name of this utility? | ✗ | ✓ | gpt-5-mini-high Wins |
T1057 | A malicious actor may attempt to list running processes on a Windows machine using a WMI-based command. Review the process creation events to find out which utility was invoked to perform this enumeration. | ✗ | ✓ | gpt-5-mini-high Wins |
T1057 | On a Windows host, investigate process events to find when Task Manager was launched via cmd with an unusual flag. What was the full command executed? | ✗ | ✓ | gpt-5-mini-high Wins |
T1053.006 | Examine the logs from the Linux system for events related to the systemd timer activation. Identify any records indicating that a new timer unit was started and enabled, and determine which timer name was used. | ✗ | ✓ | gpt-5-mini-high Wins |
Page 1 of 8
Explore individual model performance and detailed analysis