gemini-2.5-flash-preview-04-17 vs o1-low KQL Benchmark
o1-low wins by 12.2%
Compared on 188 shared test questions
Overall Accuracy
gemini-2.5-flash-preview-04-17
51.1%
96 / 188 correct
o1-low
63.3%
119 / 188 correct
Average Cost per Query
gemini-2.5-flash-preview-04-17: $0.0203
o1-low: $0.4994
o1-low costs 2356.2% more
Average Execution Time
gemini-2.5-flash-preview-04-17: 22.37s
o1-low: 50.90s
o1-low takes 127.6% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 188 questions
Page 1 of 8
T1036.003 | In a Linux environment, you observe a process labeled like the cron daemon but running from an unexpected path. Investigate creation events to uncover the actual filename used by this fake cron process. | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1039 | On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action? | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1057 | On a Windows device, review the process execution logs to find instances where a built-in listing tool was piped into a string filter. Identify the process name that the attacker was searching for. | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1057 | On a Windows host, investigate process events to find when Task Manager was launched via cmd with an unusual flag. What was the full command executed? | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1057 | While reviewing Windows process events, you spot a PowerShell process executing a WMI enumeration cmdlet. What WMI class name did the attacker query? | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1053.005 | On a Windows host, find any scheduled task that was registered using PowerShell native cmdlets instead of schtasks.exe. What was the name given to the new task? | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1059.004 | On a Linux system, analyze the process logs for suspicious command line activity that includes a sequence of commands indicating a pipe-to-shell operation. Identify the tool that was used to execute this piped command, paying special attention to its use in downloading and running script content. | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1070.003 | On a Windows endpoint, review process execution logs to see if any PowerShell sessions were wiped clean. Which command was executed to clear the PowerShell history? | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1059.004 | On a Linux system, review process execution records for any shell process that set an environment variable containing executable code and then piped it into another shell instance. Determine which environment variable name was used to store the script content. | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1070.003 | On a Linux endpoint, you suspect malicious clearing of the bash history by redirecting from the null device. Explore process or file events to uncover the exact shell command that performed this action. | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1082 | On Windows systems, identify when the built-in Shadow Copy utility is used to enumerate existing snapshots. What was the full command executed? | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1082 | A Windows system shows a cmd.exe process spawn that appears to have been used for environment discovery. Review the process creation records to identify the exact command the adversary ran to enumerate environment variables. | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1124 | In Windows process event logs, you notice both the net time and w32tm commands being executed to display the system time and timezone. Which executor name from the test configuration was responsible for launching these utilities? | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1197 | A suspicious BITS transfer was orchestrated via bitsadmin.exe on Windows, creating a job to download and then execute a payload. Investigate the process event logs to determine what custom job name was specified when the BITS job was created. | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1547 | A Windows host shows a process launching with install-driver switches, likely signaling malicious driver deployment. What is the name of the tool that was executed? | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1555 | A security investigator suspects that someone attempted to dump stored web credentials on a Windows system using an in-built command-line tool. Review process creation logs to determine which executable was called to list the Web Credentials vault. | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1557.001 | On Windows devices, hunt for PowerShell activity where a remote script is fetched and executed to perform LLMNR/NBNS spoofing. Which cmdlet kicked off the listener? | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1562.004 | On a Windows device, a new inbound firewall rule was created unexpectedly. Review process execution records to identify the command-line utility responsible for adding the rule. | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1562.004 | Investigate Windows registry modification events to find the name of the registry value that was changed under the WindowsFirewall policy path when someone turned the firewall off. | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1562 | Review Linux process execution logs to find where the system journal service was stopped. Which utility was invoked to disable journal logging? | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1622 | On the Windows device, a security check was run to detect debugger processes via PowerShell. Which tool (process) carried out this check? | ✓ | ✗ | gemini-2.5-flash-preview-04-17 Wins |
T1006 | Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path. | ✗ | ✓ | o1-low Wins |
T1007 | An analyst suspects a user or script ran a service enumeration command on a Linux system. Review process events to find the service-listing invocation and specify the full command that was executed. | ✗ | ✓ | o1-low Wins |
T1016.001 | On a Linux host, a ping command was executed to test internet connectivity. Determine which IP address was used as the ping target. | ✗ | ✓ | o1-low Wins |
T1027 | On a Windows endpoint, look for evidence of a base64-encoded PowerShell payload execution. Which executable launched the encoded command? | ✗ | ✓ | o1-low Wins |
Page 1 of 8
Explore individual model performance and detailed analysis