gpt-4-turbo-2024-04-09 vs gpt-5-high KQL Benchmark
gpt-5-high wins by 23.9%
Compared on 188 shared test questions
Overall Accuracy
gpt-4-turbo-2024-04-09
39.4%
74 / 188 correct
gpt-5-high
63.3%
119 / 188 correct
Average Cost per Query
gpt-4-turbo-2024-04-09: $0.1737
gpt-5-high: $0.1529
gpt-4-turbo-2024-04-09 costs 13.6% more
Average Execution Time
gpt-4-turbo-2024-04-09: 16.84s
gpt-5-high: 192.47s
gpt-5-high takes 1043.3% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 188 questions
Page 1 of 8
T1036.004 | A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1059.007 | On a Windows endpoint, wscript.exe was used to run a JScript. Identify the exact script path passed to wscript. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1059.004 | On a Linux system, review process execution records for any shell process that set an environment variable containing executable code and then piped it into another shell instance. Determine which environment variable name was used to store the script content. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1070.003 | On a Windows endpoint, commands are no longer being logged to PowerShell history, suggesting PSReadLine settings were altered. Using process execution logs, determine the exact command that was run to set the history save style to 'SaveNothing'. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1070.004 | A Linux host executed a native utility to overwrite and then remove a temporary file in one step. Identify the name of the file that was securely deleted by this action. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1201 | Windows systems may be probed for their password policy settings using a native command-line tool. Determine which command was executed to list the local password policy on the target hosts. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1201 | You are reviewing Linux syslog records on a CentOS/RHEL 7.x server. You notice entries for shell commands that access system configuration files under /etc/security. Determine exactly which configuration file was being inspected by the command. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1218.010 | An attacker has attempted to sideload code by invoking regsvr32.exe in a Windows host against a file that does not use the standard .dll extension. Investigate the process event logs to determine the name of the file that was registered. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1505.005 | A suspicious registry change was made on a Windows system modifying the Terminal Services DLL path. Investigate registry events to find out which DLL file name was set as the ServiceDll value under TermService. What was the file name? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1217 | On Linux, review the process execution logs to uncover when Chromium’s bookmark JSON files were being located and the results persisted. Focus on shell commands that search under .config/chromium and write output to a file. What was the filename used to save the findings? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1531 | Within Windows process event logs, identify instances where the built-in net.exe utility is used to change a user account password. What was the new password argument passed in? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1552.003 | A Linux user’s bash history was searched for patterns like ‘pass’ and ‘ssh’, and the matching lines were redirected into a new file. Determine the name of that file. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1562.004 | On a Windows device, a new inbound firewall rule was created unexpectedly. Review process execution records to identify the command-line utility responsible for adding the rule. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1562.012 | On a Linux host, auditing has been turned off. Review process execution or syslog data to determine which command was executed to disable the audit subsystem. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1007 | An analyst suspects a user or script ran a service enumeration command on a Linux system. Review process events to find the service-listing invocation and specify the full command that was executed. | ✗ | ✓ | gpt-5-high Wins |
T1006 | Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path. | ✗ | ✓ | gpt-5-high Wins |
T1003.001 | Using Windows process event logs, investigate PowerShell activity around lsass.exe memory capture. What was the name of the script file invoked to perform the dump? | ✗ | ✓ | gpt-5-high Wins |
T1003.008 | In a Linux environment, an elevated process was used to execute a command that read /etc/shadow and redirected its output to a file. Identify what file name was employed to store these results. | ✗ | ✓ | gpt-5-high Wins |
T1016.001 | An analyst notices a PowerShell process on a Windows host that appears to be checking SMB connectivity. Which PowerShell cmdlet was executed to perform this outbound port 445 test? | ✗ | ✓ | gpt-5-high Wins |
T1027 | A Windows host shows a process launch with an extremely obfuscated command line that dynamically builds and invokes code at runtime. Which process name was used to execute this payload? | ✗ | ✓ | gpt-5-high Wins |
T1027 | On a Linux system, identify the script that was generated by decoding a base64 data file and then executed. What was the filename of that script? | ✗ | ✓ | gpt-5-high Wins |
T1036.003 | A process is running under a familiar Windows host name but originates from a user's AppData folder rather than the System32 directory. Identify the filename used to masquerade the PowerShell binary on this Windows device. | ✗ | ✓ | gpt-5-high Wins |
T1036.004 | Analyze Windows process events for any schtasks.exe commands that created a new task invoking PowerShell. What is the name of the .ps1 script specified to run? | ✗ | ✓ | gpt-5-high Wins |
T1048.003 | A Linux host briefly hosted an HTTP service under /tmp. Examine process creation logs to determine the exact python3 command that was used to start the server on port 9090. | ✗ | ✓ | gpt-5-high Wins |
T1053.003 | Linux hosts may log events when new files are added to /var/spool/cron/crontabs. Query those logs for a creation or write action in that directory and determine the file name that was added. | ✗ | ✓ | gpt-5-high Wins |
Page 1 of 8
Explore individual model performance and detailed analysis