gemini-2.5-flash-preview-04-17 vs o3-high KQL Benchmark

o3-high wins by 3.7%

Compared on 188 shared test questions

Overall Accuracy

gemini-2.5-flash-preview-04-17

51.1%

96 / 188 correct

o3-high

54.8%

103 / 188 correct

Average Cost per Query

gemini-2.5-flash-preview-04-17: $0.0203
o3-high: $0.0632
o3-high costs 210.7% more

Average Execution Time

gemini-2.5-flash-preview-04-17: 22.37s
o3-high: 78.68s
o3-high takes 251.8% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1016.001
An analyst notices a PowerShell process on a Windows host that appears to be checking SMB connectivity. Which PowerShell cmdlet was executed to perform this outbound port 445 test?
gemini-2.5-flash-preview-04-17 Wins
T1036.003
In a Linux environment, you observe a process labeled like the cron daemon but running from an unexpected path. Investigate creation events to uncover the actual filename used by this fake cron process.
gemini-2.5-flash-preview-04-17 Wins
T1057
While reviewing Windows process events, you spot a PowerShell process executing a WMI enumeration cmdlet. What WMI class name did the attacker query?
gemini-2.5-flash-preview-04-17 Wins
T1053.005
On a Windows host, find any scheduled task that was registered using PowerShell native cmdlets instead of schtasks.exe. What was the name given to the new task?
gemini-2.5-flash-preview-04-17 Wins
T1069.001
Review recent Windows process event logs for PowerShell activity that suggests local group enumeration through WMI. What exact command was executed?
gemini-2.5-flash-preview-04-17 Wins
T1059.004
On a Linux system, analyze the process logs for suspicious command line activity that includes a sequence of commands indicating a pipe-to-shell operation. Identify the tool that was used to execute this piped command, paying special attention to its use in downloading and running script content.
gemini-2.5-flash-preview-04-17 Wins
T1059.004
An attacker on a Linux host may try to enumerate installed shells by reading the system file that lists valid shells. Using process or syslog data, determine which command was executed to perform this enumeration.
gemini-2.5-flash-preview-04-17 Wins
T1059.004
On a Linux system, review process execution records for any shell process that set an environment variable containing executable code and then piped it into another shell instance. Determine which environment variable name was used to store the script content.
gemini-2.5-flash-preview-04-17 Wins
T1082
On Windows systems, identify when the built-in Shadow Copy utility is used to enumerate existing snapshots. What was the full command executed?
gemini-2.5-flash-preview-04-17 Wins
T1082
Review Windows process logs to find which built-in command was executed to reveal the system’s hostname.
gemini-2.5-flash-preview-04-17 Wins
T1112
Review registry event logs on the Windows host for PowerShell-driven writes to system policy and file system keys. Which registry value names were created during this BlackByte preparation simulation?
gemini-2.5-flash-preview-04-17 Wins
T1112
On a Windows device, examine registry event logs for modifications under the System policies path. Determine which registry value name was altered to disable the shutdown button at login.
gemini-2.5-flash-preview-04-17 Wins
T1112
Investigate Windows registry events to identify any newly set ProxyServer entry under the user Internet Settings hive. What proxy server address was configured?
gemini-2.5-flash-preview-04-17 Wins
T1124
In Windows process event logs, you notice both the net time and w32tm commands being executed to display the system time and timezone. Which executor name from the test configuration was responsible for launching these utilities?
gemini-2.5-flash-preview-04-17 Wins
T1134.001
A Windows host logs show PowerShell fetching and executing a remote script to gain SeDebugPrivilege token duplication. Which Empire module was invoked?
gemini-2.5-flash-preview-04-17 Wins
T1201
You are reviewing Linux syslog records on a CentOS/RHEL 7.x server. You notice entries for shell commands that access system configuration files under /etc/security. Determine exactly which configuration file was being inspected by the command.
gemini-2.5-flash-preview-04-17 Wins
T1217
On Linux, review the process execution logs to uncover when Chromium’s bookmark JSON files were being located and the results persisted. Focus on shell commands that search under .config/chromium and write output to a file. What was the filename used to save the findings?
gemini-2.5-flash-preview-04-17 Wins
T1218.010
An attacker has attempted to sideload code by invoking regsvr32.exe in a Windows host against a file that does not use the standard .dll extension. Investigate the process event logs to determine the name of the file that was registered.
gemini-2.5-flash-preview-04-17 Wins
T1546.004
Investigate recent file modification events on Linux that could reveal an adversary appending commands to a user’s ~/.profile for persistence. Determine the exact command that was added.
gemini-2.5-flash-preview-04-17 Wins
T1547
A Windows host shows a process launching with install-driver switches, likely signaling malicious driver deployment. What is the name of the tool that was executed?
gemini-2.5-flash-preview-04-17 Wins
T1552.001
A Linux system shows a 'find' command used to search within .aws directories. Which specific AWS credential filename was the attacker attempting to locate?
gemini-2.5-flash-preview-04-17 Wins
T1555
On a Windows host, an external PowerShell script is fetched and run to harvest local Wi-Fi credentials. Investigate the process execution logs to find out what script file name was downloaded and invoked.
gemini-2.5-flash-preview-04-17 Wins
T1555
A security investigator suspects that someone attempted to dump stored web credentials on a Windows system using an in-built command-line tool. Review process creation logs to determine which executable was called to list the Web Credentials vault.
gemini-2.5-flash-preview-04-17 Wins
T1560
Windows system logs show PowerShell zipping up the contents of a user’s profile folder. Investigate process and file events to determine the exact name of the ZIP archive created.
gemini-2.5-flash-preview-04-17 Wins
T1562.003
On a Linux system you suspect someone altered Bash’s history settings to hide their activity. Investigate process logs for evidence of HISTCONTROL being set to ignore entries. What was the full command executed to configure HISTCONTROL?
gemini-2.5-flash-preview-04-17 Wins
Page 1 of 8

Explore individual model performance and detailed analysis