gpt-4-turbo-2024-04-09 vs gpt-5-mini-medium KQL Benchmark

gpt-5-mini-medium wins by 5.9%

Compared on 187 shared test questions

Overall Accuracy

gpt-4-turbo-2024-04-09

39.6%

74 / 187 correct

gpt-5-mini-medium

45.5%

85 / 187 correct

Average Cost per Query

gpt-4-turbo-2024-04-09: $0.1737
gpt-5-mini-medium: $0.0150
gpt-4-turbo-2024-04-09 costs 1056.1% more

Average Execution Time

gpt-4-turbo-2024-04-09: 16.84s
gpt-5-mini-medium: 47.16s
gpt-5-mini-medium takes 180.1% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 187 questions
Page 1 of 8
T1018
On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache?
gpt-4-turbo-2024-04-09 Wins
T1036.003
In a Linux environment, you observe a process labeled like the cron daemon but running from an unexpected path. Investigate creation events to uncover the actual filename used by this fake cron process.
gpt-4-turbo-2024-04-09 Wins
T1039
On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action?
gpt-4-turbo-2024-04-09 Wins
T1049
A user launched a Windows command prompt and executed a built-in utility to enumerate all active network connections. Using process creation logs, identify the exact tool that produced the list of current connections.
gpt-4-turbo-2024-04-09 Wins
T1057
A Windows endpoint recorded a command-line activity through cmd.exe that lists all running processes. Determine which built-in tool was executed to perform this action.
gpt-4-turbo-2024-04-09 Wins
T1059.004
On a Linux system, review process execution records for any shell process that set an environment variable containing executable code and then piped it into another shell instance. Determine which environment variable name was used to store the script content.
gpt-4-turbo-2024-04-09 Wins
T1070.003
On a Windows endpoint, review process execution logs to see if any PowerShell sessions were wiped clean. Which command was executed to clear the PowerShell history?
gpt-4-turbo-2024-04-09 Wins
T1070.003
On a Linux system, you suspect someone erased their command history by linking the history file to /dev/null. Investigate process events and determine which utility was executed to achieve this.
gpt-4-turbo-2024-04-09 Wins
T1078.003
Review the Linux process creation records to find which user account management utility was used to reactivate the previously locked and expired account.
gpt-4-turbo-2024-04-09 Wins
T1082
A Windows system shows a cmd.exe process spawn that appears to have been used for environment discovery. Review the process creation records to identify the exact command the adversary ran to enumerate environment variables.
gpt-4-turbo-2024-04-09 Wins
T1112
On Windows systems, disabling RDP via the registry generates registry write events. Investigate registry event logs for modifications under the Terminal Server configuration path. What is the name of the registry value that was changed to disable Remote Desktop Protocol?
gpt-4-turbo-2024-04-09 Wins
T1112
On a Windows endpoint, review the registry write events to spot when the WDigest key is altered to permit plaintext credential storage. What registry value name was changed?
gpt-4-turbo-2024-04-09 Wins
T1112
Investigate Windows registry events to identify any newly set ProxyServer entry under the user Internet Settings hive. What proxy server address was configured?
gpt-4-turbo-2024-04-09 Wins
T1201
You are reviewing Linux syslog records on a CentOS/RHEL 7.x server. You notice entries for shell commands that access system configuration files under /etc/security. Determine exactly which configuration file was being inspected by the command.
gpt-4-turbo-2024-04-09 Wins
T1222.002
On a Linux host, process execution logs show a chmod invocation with a recursive flag. Which file or folder was targeted by this recursive permission change?
gpt-4-turbo-2024-04-09 Wins
T1218.010
An attacker has attempted to sideload code by invoking regsvr32.exe in a Windows host against a file that does not use the standard .dll extension. Investigate the process event logs to determine the name of the file that was registered.
gpt-4-turbo-2024-04-09 Wins
T1542.001
Investigate Windows file creation logs to uncover any new executable added directly to the System32 directory, which may indicate a UEFI persistence implant. What was the name of the file created?
gpt-4-turbo-2024-04-09 Wins
T1531
Within Windows process event logs, identify instances where the built-in net.exe utility is used to change a user account password. What was the new password argument passed in?
gpt-4-turbo-2024-04-09 Wins
T1552.003
A Linux user’s bash history was searched for patterns like ‘pass’ and ‘ssh’, and the matching lines were redirected into a new file. Determine the name of that file.
gpt-4-turbo-2024-04-09 Wins
T1560
Windows system logs show PowerShell zipping up the contents of a user’s profile folder. Investigate process and file events to determine the exact name of the ZIP archive created.
gpt-4-turbo-2024-04-09 Wins
T1562.003
On a Linux system you suspect someone altered Bash’s history settings to hide their activity. Investigate process logs for evidence of HISTCONTROL being set to ignore entries. What was the full command executed to configure HISTCONTROL?
gpt-4-turbo-2024-04-09 Wins
T1562.003
During a Linux engagement, an operator exported HISTSIZE=0 to turn off shell history and then ran another utility. Using the DeviceProcessEvents table, identify the exact command executed immediately after the history was disabled. What was the command?
gpt-4-turbo-2024-04-09 Wins
T1562.003
Within Linux process execution records, locate any bash commands where the HISTFILESIZE environment variable was exported. What value was assigned to HISTFILESIZE?
gpt-4-turbo-2024-04-09 Wins
T1562.004
On a Windows device, a new inbound firewall rule was created unexpectedly. Review process execution records to identify the command-line utility responsible for adding the rule.
gpt-4-turbo-2024-04-09 Wins
T1562.012
A Linux system’s audit framework appears to have been reset unexpectedly. Search your process execution records to identify which exact invocation removed all auditd rules. What full command was executed?
gpt-4-turbo-2024-04-09 Wins
Page 1 of 8

Explore individual model performance and detailed analysis