gpt-4.1 vs gpt-5-mini-medium KQL Benchmark
gpt-4.1 wins by 16.6%
Compared on 187 shared test questions
Overall Accuracy
gpt-4.1
62.0%
116 / 187 correct
gpt-5-mini-medium
45.5%
85 / 187 correct
Average Cost per Query
gpt-4.1: $0.0285
gpt-5-mini-medium: $0.0150
gpt-4.1 costs 89.6% more
Average Execution Time
gpt-4.1: 9.95s
gpt-5-mini-medium: 47.16s
gpt-5-mini-medium takes 374.1% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 187 questions
Page 1 of 8
T1007 | An analyst suspects a user or script ran a service enumeration command on a Linux system. Review process events to find the service-listing invocation and specify the full command that was executed. | ✓ | ✗ | gpt-4.1 Wins |
T1006 | Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path. | ✓ | ✗ | gpt-4.1 Wins |
T1018 | On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache? | ✓ | ✗ | gpt-4.1 Wins |
T1027 | On a Linux system, identify the script that was generated by decoding a base64 data file and then executed. What was the filename of that script? | ✓ | ✗ | gpt-4.1 Wins |
T1036.003 | A process is running under a familiar Windows host name but originates from a user's AppData folder rather than the System32 directory. Identify the filename used to masquerade the PowerShell binary on this Windows device. | ✓ | ✗ | gpt-4.1 Wins |
T1049 | A user launched a Windows command prompt and executed a built-in utility to enumerate all active network connections. Using process creation logs, identify the exact tool that produced the list of current connections. | ✓ | ✗ | gpt-4.1 Wins |
T1039 | On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action? | ✓ | ✗ | gpt-4.1 Wins |
T1053.003 | Linux hosts may log events when new files are added to /var/spool/cron/crontabs. Query those logs for a creation or write action in that directory and determine the file name that was added. | ✓ | ✗ | gpt-4.1 Wins |
T1053.006 | Examine the logs from the Linux system for events related to the systemd timer activation. Identify any records indicating that a new timer unit was started and enabled, and determine which timer name was used. | ✓ | ✗ | gpt-4.1 Wins |
T1053.005 | On Windows, review recent registry changes to detect when the MSC file association was hijacked by a reg add operation. What executable file was configured as the default command under HKCU\Software\Classes\mscfile\shell\open\command? | ✓ | ✗ | gpt-4.1 Wins |
T1059.004 | An attacker on a Linux host may try to enumerate installed shells by reading the system file that lists valid shells. Using process or syslog data, determine which command was executed to perform this enumeration. | ✓ | ✗ | gpt-4.1 Wins |
T1057 | A Windows endpoint recorded a command-line activity through cmd.exe that lists all running processes. Determine which built-in tool was executed to perform this action. | ✓ | ✗ | gpt-4.1 Wins |
T1069.001 | On a Linux endpoint, process events reveal a chain of group‐enumeration utilities executed by a single session. Which utility was used to query the system’s group database? | ✓ | ✗ | gpt-4.1 Wins |
T1070.003 | On a Linux system, you suspect someone erased their command history by linking the history file to /dev/null. Investigate process events and determine which utility was executed to achieve this. | ✓ | ✗ | gpt-4.1 Wins |
T1070.003 | On a Windows endpoint, review process execution logs to see if any PowerShell sessions were wiped clean. Which command was executed to clear the PowerShell history? | ✓ | ✗ | gpt-4.1 Wins |
T1070.006 | On a Windows host, suspicious PowerShell activity adjusted the system clock and recorded a value. What numeric value was used to slip the system date? | ✓ | ✗ | gpt-4.1 Wins |
T1078.003 | On a Linux host, review account management activity in Syslog or process event logs to pinpoint which command was executed to create a new local user. What was the name of the tool invoked? | ✓ | ✗ | gpt-4.1 Wins |
T1078.003 | Review the Linux process creation records to find which user account management utility was used to reactivate the previously locked and expired account. | ✓ | ✗ | gpt-4.1 Wins |
T1070.006 | On a Linux system, attackers may use timestamp manipulation to hide malicious changes. Investigate relevant logs to identify which file’s modification timestamp was altered by such a command. | ✓ | ✗ | gpt-4.1 Wins |
T1112 | Evidence shows that the Windows Defender startup entry was tampered with via an elevated command prompt. Investigate registry events related to the Run key to discover which executable replaced the default SecurityHealth value. What is the name of the new program? | ✓ | ✗ | gpt-4.1 Wins |
T1090.003 | On a Linux endpoint, a command was executed to start a proxy service commonly used for onion routing. Identify the name of the service that was launched to enable this proxy functionality. | ✓ | ✗ | gpt-4.1 Wins |
T1112 | A Windows user’s registry was altered via a command-line tool to disable the lock workstation feature by adding a DWORD entry under the current user Policies\System key. Which registry value name was modified in this operation? | ✓ | ✗ | gpt-4.1 Wins |
T1124 | In Windows process event logs, you notice both the net time and w32tm commands being executed to display the system time and timezone. Which executor name from the test configuration was responsible for launching these utilities? | ✓ | ✗ | gpt-4.1 Wins |
T1082 | A Windows system shows a cmd.exe process spawn that appears to have been used for environment discovery. Review the process creation records to identify the exact command the adversary ran to enumerate environment variables. | ✓ | ✗ | gpt-4.1 Wins |
T1112 | On a Windows endpoint, review the registry write events to spot when the WDigest key is altered to permit plaintext credential storage. What registry value name was changed? | ✓ | ✗ | gpt-4.1 Wins |
Page 1 of 8
Explore individual model performance and detailed analysis