gpt-4-turbo-2024-04-09 vs o3-mini-low KQL Benchmark
o3-mini-low wins by 12.2%
Compared on 188 shared test questions
Overall Accuracy
gpt-4-turbo-2024-04-09
39.4%
74 / 188 correct
o3-mini-low
51.6%
97 / 188 correct
Average Cost per Query
gpt-4-turbo-2024-04-09: $0.1737
o3-mini-low: $0.0279
gpt-4-turbo-2024-04-09 costs 523.2% more
Average Execution Time
gpt-4-turbo-2024-04-09: 16.84s
o3-mini-low: 36.54s
o3-mini-low takes 117.1% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 188 questions
Page 1 of 8
T1003 | On a Windows host, an attacker leveraged COMSVCS.DLL via rundll32.exe to dump the memory of svchost.exe into a file under the Temp directory. Review the file creation logs to determine the exact name of the dump file that was generated. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1036.004 | A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1039 | On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1057 | On a Windows host, investigate process events to find when Task Manager was launched via cmd with an unusual flag. What was the full command executed? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1059.004 | On a Linux system, review process execution records for any shell process that set an environment variable containing executable code and then piped it into another shell instance. Determine which environment variable name was used to store the script content. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1070.003 | On a Linux endpoint, you suspect malicious clearing of the bash history by redirecting from the null device. Explore process or file events to uncover the exact shell command that performed this action. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1070.005 | On a Windows system, an attacker used the command prompt to remove one or more default administrative shares. Which share names were deleted? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1070.004 | Suspiciously, the recycle bin appears empty system-wide. Determine which command was executed on Windows to clear the system's recycle bin directory, including any switches and environment variables. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1217 | An attacker is suspected of using the Windows shell to enumerate a user’s Internet Explorer bookmarks via the Favorites folder. Identify the exact command they executed to perform this listing. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1201 | You are reviewing Linux syslog records on a CentOS/RHEL 7.x server. You notice entries for shell commands that access system configuration files under /etc/security. Determine exactly which configuration file was being inspected by the command. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1542.001 | Investigate Windows file creation logs to uncover any new executable added directly to the System32 directory, which may indicate a UEFI persistence implant. What was the name of the file created? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1560 | Windows system logs show PowerShell zipping up the contents of a user’s profile folder. Investigate process and file events to determine the exact name of the ZIP archive created. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1564.002 | On Windows systems, identify any user account that was hidden by setting its value to 0 under the SpecialAccounts\\UserList registry key. What was the name of the hidden account? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1614.001 | Using Linux process or syslog logs, identify the executable that was run to output the system's locale information. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1571 | On a Windows system, identify any PowerShell Test-NetConnection executions against an uncommon port. Which port number was checked? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1003.008 | In a Linux environment, an elevated process was used to execute a command that read /etc/shadow and redirected its output to a file. Identify what file name was employed to store these results. | ✗ | ✓ | o3-mini-low Wins |
T1027 | A Windows host shows a process launch with an extremely obfuscated command line that dynamically builds and invokes code at runtime. Which process name was used to execute this payload? | ✗ | ✓ | o3-mini-low Wins |
T1027 | On a Linux system, identify the script that was generated by decoding a base64 data file and then executed. What was the filename of that script? | ✗ | ✓ | o3-mini-low Wins |
T1036.004 | Analyze Windows process events for any schtasks.exe commands that created a new task invoking PowerShell. What is the name of the .ps1 script specified to run? | ✗ | ✓ | o3-mini-low Wins |
T1048.003 | A Linux host briefly hosted an HTTP service under /tmp. Examine process creation logs to determine the exact python3 command that was used to start the server on port 9090. | ✗ | ✓ | o3-mini-low Wins |
T1053.003 | Linux hosts may log events when new files are added to /var/spool/cron/crontabs. Query those logs for a creation or write action in that directory and determine the file name that was added. | ✗ | ✓ | o3-mini-low Wins |
T1057 | A malicious actor may attempt to list running processes on a Windows machine using a WMI-based command. Review the process creation events to find out which utility was invoked to perform this enumeration. | ✗ | ✓ | o3-mini-low Wins |
T1059.004 | An attacker on a Linux host may try to enumerate installed shells by reading the system file that lists valid shells. Using process or syslog data, determine which command was executed to perform this enumeration. | ✗ | ✓ | o3-mini-low Wins |
T1069.001 | Investigate Windows process execution logs for a PowerShell cmdlet used to list group members. Look for entries where a group name is provided after a '-Name' flag and identify which group was queried. | ✗ | ✓ | o3-mini-low Wins |
T1069.001 | On a Linux endpoint, process events reveal a chain of group‐enumeration utilities executed by a single session. Which utility was used to query the system’s group database? | ✗ | ✓ | o3-mini-low Wins |
Page 1 of 8
Explore individual model performance and detailed analysis