gemini-2.5-flash-preview-04-17 vs gpt-5-high KQL Benchmark

gpt-5-high wins by 12.2%

Compared on 188 shared test questions

Overall Accuracy

gemini-2.5-flash-preview-04-17

51.1%

96 / 188 correct

gpt-5-high

63.3%

119 / 188 correct

Average Cost per Query

gemini-2.5-flash-preview-04-17: $0.0203
gpt-5-high: $0.1529
gpt-5-high costs 651.9% more

Average Execution Time

gemini-2.5-flash-preview-04-17: 22.37s
gpt-5-high: 192.47s
gpt-5-high takes 760.6% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1057
While reviewing Windows process events, you spot a PowerShell process executing a WMI enumeration cmdlet. What WMI class name did the attacker query?
gemini-2.5-flash-preview-04-17 Wins
T1053.005
On a Windows host, find any scheduled task that was registered using PowerShell native cmdlets instead of schtasks.exe. What was the name given to the new task?
gemini-2.5-flash-preview-04-17 Wins
T1059.007
On a Windows endpoint, wscript.exe was used to run a JScript. Identify the exact script path passed to wscript.
gemini-2.5-flash-preview-04-17 Wins
T1059.004
An attacker on a Linux host may try to enumerate installed shells by reading the system file that lists valid shells. Using process or syslog data, determine which command was executed to perform this enumeration.
gemini-2.5-flash-preview-04-17 Wins
T1070.003
On a Windows endpoint, commands are no longer being logged to PowerShell history, suggesting PSReadLine settings were altered. Using process execution logs, determine the exact command that was run to set the history save style to 'SaveNothing'.
gemini-2.5-flash-preview-04-17 Wins
T1059.004
On a Linux system, review process execution records for any shell process that set an environment variable containing executable code and then piped it into another shell instance. Determine which environment variable name was used to store the script content.
gemini-2.5-flash-preview-04-17 Wins
T1082
Using Linux process execution logs, identify the specific command that was used to filter loaded kernel modules for entries containing “vmw.” What was that full command?
gemini-2.5-flash-preview-04-17 Wins
T1112
Review registry event logs on the Windows host for PowerShell-driven writes to system policy and file system keys. Which registry value names were created during this BlackByte preparation simulation?
gemini-2.5-flash-preview-04-17 Wins
T1124
In Windows process event logs, you notice both the net time and w32tm commands being executed to display the system time and timezone. Which executor name from the test configuration was responsible for launching these utilities?
gemini-2.5-flash-preview-04-17 Wins
T1201
Windows systems may be probed for their password policy settings using a native command-line tool. Determine which command was executed to list the local password policy on the target hosts.
gemini-2.5-flash-preview-04-17 Wins
T1201
You are reviewing Linux syslog records on a CentOS/RHEL 7.x server. You notice entries for shell commands that access system configuration files under /etc/security. Determine exactly which configuration file was being inspected by the command.
gemini-2.5-flash-preview-04-17 Wins
T1217
On Linux, review the process execution logs to uncover when Chromium’s bookmark JSON files were being located and the results persisted. Focus on shell commands that search under .config/chromium and write output to a file. What was the filename used to save the findings?
gemini-2.5-flash-preview-04-17 Wins
T1218.010
An attacker has attempted to sideload code by invoking regsvr32.exe in a Windows host against a file that does not use the standard .dll extension. Investigate the process event logs to determine the name of the file that was registered.
gemini-2.5-flash-preview-04-17 Wins
T1217
An attacker leveraged a PowerShell command on a Windows host to enumerate browser bookmark files across all user profiles. Examine the process execution logs to determine the exact filename that was being searched for.
gemini-2.5-flash-preview-04-17 Wins
T1546.004
Investigate recent file modification events on Linux that could reveal an adversary appending commands to a user’s ~/.profile for persistence. Determine the exact command that was added.
gemini-2.5-flash-preview-04-17 Wins
T1547
A Windows host shows a process launching with install-driver switches, likely signaling malicious driver deployment. What is the name of the tool that was executed?
gemini-2.5-flash-preview-04-17 Wins
T1555
On a Windows host, an external PowerShell script is fetched and run to harvest local Wi-Fi credentials. Investigate the process execution logs to find out what script file name was downloaded and invoked.
gemini-2.5-flash-preview-04-17 Wins
T1553.006
A Windows host shows registry modifications in its boot configuration store enabling test signing mode. Investigate which process made this change and identify the exact command it ran to turn on test signing.
gemini-2.5-flash-preview-04-17 Wins
T1562.004
On a Windows device, a new inbound firewall rule was created unexpectedly. Review process execution records to identify the command-line utility responsible for adding the rule.
gemini-2.5-flash-preview-04-17 Wins
T1562.004
Investigate Windows registry modification events to find the name of the registry value that was changed under the WindowsFirewall policy path when someone turned the firewall off.
gemini-2.5-flash-preview-04-17 Wins
T1562.012
On a Linux host, auditing has been turned off. Review process execution or syslog data to determine which command was executed to disable the audit subsystem.
gemini-2.5-flash-preview-04-17 Wins
T1622
On the Windows device, a security check was run to detect debugger processes via PowerShell. Which tool (process) carried out this check?
gemini-2.5-flash-preview-04-17 Wins
T1003.008
In a Linux environment, an elevated process was used to execute a command that read /etc/shadow and redirected its output to a file. Identify what file name was employed to store these results.
gpt-5-high Wins
T1006
Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path.
gpt-5-high Wins
T1007
An analyst suspects a user or script ran a service enumeration command on a Linux system. Review process events to find the service-listing invocation and specify the full command that was executed.
gpt-5-high Wins
Page 1 of 8

Explore individual model performance and detailed analysis