gpt-4.1 vs gpt-5-mini-medium KQL Benchmark

gpt-4.1 wins by 16.6%

Compared on 187 shared test questions

Overall Accuracy

gpt-4.1

62.0%

116 / 187 correct

gpt-5-mini-medium

45.5%

85 / 187 correct

Average Cost per Query

gpt-4.1: $0.0285
gpt-5-mini-medium: $0.0150
gpt-4.1 costs 89.6% more

Average Execution Time

gpt-4.1: 9.95s
gpt-5-mini-medium: 47.16s
gpt-5-mini-medium takes 374.1% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 187 questions
Page 1 of 8
T1007
An analyst suspects a user or script ran a service enumeration command on a Linux system. Review process events to find the service-listing invocation and specify the full command that was executed.
gpt-4.1 Wins
T1006
Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path.
gpt-4.1 Wins
T1018
On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache?
gpt-4.1 Wins
T1027
On a Linux system, identify the script that was generated by decoding a base64 data file and then executed. What was the filename of that script?
gpt-4.1 Wins
T1036.003
A process is running under a familiar Windows host name but originates from a user's AppData folder rather than the System32 directory. Identify the filename used to masquerade the PowerShell binary on this Windows device.
gpt-4.1 Wins
T1049
A user launched a Windows command prompt and executed a built-in utility to enumerate all active network connections. Using process creation logs, identify the exact tool that produced the list of current connections.
gpt-4.1 Wins
T1039
On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action?
gpt-4.1 Wins
T1053.003
Linux hosts may log events when new files are added to /var/spool/cron/crontabs. Query those logs for a creation or write action in that directory and determine the file name that was added.
gpt-4.1 Wins
T1053.006
Examine the logs from the Linux system for events related to the systemd timer activation. Identify any records indicating that a new timer unit was started and enabled, and determine which timer name was used.
gpt-4.1 Wins
T1053.005
On Windows, review recent registry changes to detect when the MSC file association was hijacked by a reg add operation. What executable file was configured as the default command under HKCU\Software\Classes\mscfile\shell\open\command?
gpt-4.1 Wins
T1059.004
An attacker on a Linux host may try to enumerate installed shells by reading the system file that lists valid shells. Using process or syslog data, determine which command was executed to perform this enumeration.
gpt-4.1 Wins
T1057
A Windows endpoint recorded a command-line activity through cmd.exe that lists all running processes. Determine which built-in tool was executed to perform this action.
gpt-4.1 Wins
T1069.001
On a Linux endpoint, process events reveal a chain of group‐enumeration utilities executed by a single session. Which utility was used to query the system’s group database?
gpt-4.1 Wins
T1070.003
On a Linux system, you suspect someone erased their command history by linking the history file to /dev/null. Investigate process events and determine which utility was executed to achieve this.
gpt-4.1 Wins
T1070.003
On a Windows endpoint, review process execution logs to see if any PowerShell sessions were wiped clean. Which command was executed to clear the PowerShell history?
gpt-4.1 Wins
T1070.006
On a Windows host, suspicious PowerShell activity adjusted the system clock and recorded a value. What numeric value was used to slip the system date?
gpt-4.1 Wins
T1078.003
On a Linux host, review account management activity in Syslog or process event logs to pinpoint which command was executed to create a new local user. What was the name of the tool invoked?
gpt-4.1 Wins
T1078.003
Review the Linux process creation records to find which user account management utility was used to reactivate the previously locked and expired account.
gpt-4.1 Wins
T1070.006
On a Linux system, attackers may use timestamp manipulation to hide malicious changes. Investigate relevant logs to identify which file’s modification timestamp was altered by such a command.
gpt-4.1 Wins
T1112
Evidence shows that the Windows Defender startup entry was tampered with via an elevated command prompt. Investigate registry events related to the Run key to discover which executable replaced the default SecurityHealth value. What is the name of the new program?
gpt-4.1 Wins
T1090.003
On a Linux endpoint, a command was executed to start a proxy service commonly used for onion routing. Identify the name of the service that was launched to enable this proxy functionality.
gpt-4.1 Wins
T1112
A Windows user’s registry was altered via a command-line tool to disable the lock workstation feature by adding a DWORD entry under the current user Policies\System key. Which registry value name was modified in this operation?
gpt-4.1 Wins
T1124
In Windows process event logs, you notice both the net time and w32tm commands being executed to display the system time and timezone. Which executor name from the test configuration was responsible for launching these utilities?
gpt-4.1 Wins
T1082
A Windows system shows a cmd.exe process spawn that appears to have been used for environment discovery. Review the process creation records to identify the exact command the adversary ran to enumerate environment variables.
gpt-4.1 Wins
T1112
On a Windows endpoint, review the registry write events to spot when the WDigest key is altered to permit plaintext credential storage. What registry value name was changed?
gpt-4.1 Wins
Page 1 of 8

Explore individual model performance and detailed analysis