gpt-5-mini-medium vs o1-low KQL Benchmark
o1-low wins by 17.6%
Compared on 187 shared test questions
Overall Accuracy
gpt-5-mini-medium
45.5%
85 / 187 correct
o1-low
63.1%
118 / 187 correct
Average Cost per Query
gpt-5-mini-medium: $0.0150
o1-low: $0.4994
o1-low costs 3223.5% more
Average Execution Time
gpt-5-mini-medium: 47.16s
o1-low: 50.90s
o1-low takes 7.9% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 187 questions
Page 1 of 8
T1057 | On a Windows host, investigate process events to find when Task Manager was launched via cmd with an unusual flag. What was the full command executed? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1053.005 | Investigate Windows process events for PowerShell activity that leverages WMI to register a scheduled task via XML import. What was the name of the XML file supplied to the RegisterByXml method? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1069.001 | Investigate Windows process execution logs for a PowerShell cmdlet used to list group members. Look for entries where a group name is provided after a '-Name' flag and identify which group was queried. | ✓ | ✗ | gpt-5-mini-medium Wins |
T1070.003 | On a Linux endpoint, you suspect malicious clearing of the bash history by redirecting from the null device. Explore process or file events to uncover the exact shell command that performed this action. | ✓ | ✗ | gpt-5-mini-medium Wins |
T1082 | On Windows systems, identify when the built-in Shadow Copy utility is used to enumerate existing snapshots. What was the full command executed? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1057 | While reviewing Windows process events, you spot a PowerShell process executing a WMI enumeration cmdlet. What WMI class name did the attacker query? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1120 | Review Windows process and PowerShell activity for commands that enumerate PnP entities through WMI. Which PowerShell cmdlet was invoked to perform this hardware inventory? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1547.014 | A Windows endpoint shows an Active Setup entry under Internet Explorer Core Fonts being altered with a StubPath value. Investigate the registry events and identify the payload that was set. | ✓ | ✗ | gpt-5-mini-medium Wins |
T1547 | A Windows host shows a process launching with install-driver switches, likely signaling malicious driver deployment. What is the name of the tool that was executed? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1217 | On a Windows system, you notice a process that recursively enumerates files named 'Bookmarks' under every user profile directory. Which Windows command-line utility was used to perform that search? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1546.004 | A suspicious file modification on a Linux device targeted the ~/.bash_profile file, apparently adding a new line. What was the full command string that was appended? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1547.002 | A Windows host shows a suspicious registry change under the LSA hive. Review recent registry events to locate any new entries under Authentication Packages and determine the name of the DLL the attacker added. | ✓ | ✗ | gpt-5-mini-medium Wins |
T1548.001 | A Linux system shows a shell invocation that appears to be searching for files with elevated group permissions. Using the available process execution logs, determine exactly what command was run. | ✓ | ✗ | gpt-5-mini-medium Wins |
T1562.006 | A .NET tracing environment variable was turned off in a user’s registry on a Windows system. Which built-in command-line tool was used to make this registry change? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1562 | Review Linux process execution logs to find where the system journal service was stopped. Which utility was invoked to disable journal logging? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1557.001 | On Windows devices, hunt for PowerShell activity where a remote script is fetched and executed to perform LLMNR/NBNS spoofing. Which cmdlet kicked off the listener? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1622 | On the Windows device, a security check was run to detect debugger processes via PowerShell. Which tool (process) carried out this check? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1006 | Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path. | ✗ | ✓ | o1-low Wins |
T1049 | A user launched a Windows command prompt and executed a built-in utility to enumerate all active network connections. Using process creation logs, identify the exact tool that produced the list of current connections. | ✗ | ✓ | o1-low Wins |
T1036.003 | A process is running under a familiar Windows host name but originates from a user's AppData folder rather than the System32 directory. Identify the filename used to masquerade the PowerShell binary on this Windows device. | ✗ | ✓ | o1-low Wins |
T1007 | An analyst suspects a user or script ran a service enumeration command on a Linux system. Review process events to find the service-listing invocation and specify the full command that was executed. | ✗ | ✓ | o1-low Wins |
T1053.003 | Linux hosts may log events when new files are added to /var/spool/cron/crontabs. Query those logs for a creation or write action in that directory and determine the file name that was added. | ✗ | ✓ | o1-low Wins |
T1018 | On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache? | ✗ | ✓ | o1-low Wins |
T1053.006 | Examine the logs from the Linux system for events related to the systemd timer activation. Identify any records indicating that a new timer unit was started and enabled, and determine which timer name was used. | ✗ | ✓ | o1-low Wins |
T1057 | A Windows endpoint recorded a command-line activity through cmd.exe that lists all running processes. Determine which built-in tool was executed to perform this action. | ✗ | ✓ | o1-low Wins |
Page 1 of 8
Explore individual model performance and detailed analysis