gpt-5-mini-medium vs grok-3-mini-beta KQL Benchmark
grok-3-mini-beta wins by 13.4%
Compared on 187 shared test questions
Overall Accuracy
gpt-5-mini-medium
45.5%
85 / 187 correct
grok-3-mini-beta
58.8%
110 / 187 correct
Average Cost per Query
gpt-5-mini-medium: $0.0150
grok-3-mini-beta: $0.0040
gpt-5-mini-medium costs 279.0% more
Average Execution Time
gpt-5-mini-medium: 47.16s
grok-3-mini-beta: 22.38s
gpt-5-mini-medium takes 110.7% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 187 questions
Page 1 of 8
T1016.001 | On a Linux host, a ping command was executed to test internet connectivity. Determine which IP address was used as the ping target. | ✓ | ✗ | gpt-5-mini-medium Wins |
T1036.004 | A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1003.001 | Using Windows process event logs, investigate PowerShell activity around lsass.exe memory capture. What was the name of the script file invoked to perform the dump? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1018 | Review Linux process execution records for any commands that list TCP metric cache entries and filter out loopback interfaces. Which utility was used? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1016 | A Linux host’s Syslog shows a shell-based network discovery script ran multiple commands. One of them listed current TCP connections. Which utility was invoked? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1053.005 | Investigate Windows process events for PowerShell activity that leverages WMI to register a scheduled task via XML import. What was the name of the XML file supplied to the RegisterByXml method? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1036.004 | Analyze Windows process events for any schtasks.exe commands that created a new task invoking PowerShell. What is the name of the .ps1 script specified to run? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1057 | While reviewing Windows process events, you spot a PowerShell process executing a WMI enumeration cmdlet. What WMI class name did the attacker query? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1070.004 | While reviewing Windows process events, you observe a command that recursively deleted a folder under the temporary directory. Use the process event data to identify which process or tool executed this recursive delete. | ✓ | ✗ | gpt-5-mini-medium Wins |
T1059.004 | An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell. | ✓ | ✗ | gpt-5-mini-medium Wins |
T1082 | A Linux host was used to collect various system release files and kernel details, writing them into a single file under /tmp. What was the name of that output file? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1070.004 | A Linux host executed a native utility to overwrite and then remove a temporary file in one step. Identify the name of the file that was securely deleted by this action. | ✓ | ✗ | gpt-5-mini-medium Wins |
T1120 | Review Windows process and PowerShell activity for commands that enumerate PnP entities through WMI. Which PowerShell cmdlet was invoked to perform this hardware inventory? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1124 | On a Linux host, an activity was recorded where the local clock and timezone were queried. Review the available process execution logs to uncover what full command was run to fetch the system time and timezone. | ✓ | ✗ | gpt-5-mini-medium Wins |
T1547 | A Windows host shows a process launching with install-driver switches, likely signaling malicious driver deployment. What is the name of the tool that was executed? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1217 | On a Windows system, you notice a process that recursively enumerates files named 'Bookmarks' under every user profile directory. Which Windows command-line utility was used to perform that search? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1547.002 | A Windows host shows a suspicious registry change under the LSA hive. Review recent registry events to locate any new entries under Authentication Packages and determine the name of the DLL the attacker added. | ✓ | ✗ | gpt-5-mini-medium Wins |
T1548.001 | A Linux system shows a shell invocation that appears to be searching for files with elevated group permissions. Using the available process execution logs, determine exactly what command was run. | ✓ | ✗ | gpt-5-mini-medium Wins |
T1548.002 | On a Windows host, sift through registry modification events targeting HKLM\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Policies\\System\\ConsentPromptBehaviorAdmin. What new value was written to disable the admin consent prompt? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1548.001 | Investigate Linux process or syslog records to find any invocation of the 'find' utility used to scan /usr/bin for files with the setuid bit. What was the full command executed? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1614.001 | In a Windows environment, locate any occurrences where an elevated DISM utility was run to enumerate the system’s international (locale) settings. What was the exact command line used? | ✓ | ✗ | gpt-5-mini-medium Wins |
T1006 | Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path. | ✗ | ✓ | grok-3-mini-beta Wins |
T1049 | A user launched a Windows command prompt and executed a built-in utility to enumerate all active network connections. Using process creation logs, identify the exact tool that produced the list of current connections. | ✗ | ✓ | grok-3-mini-beta Wins |
T1036.003 | A process is running under a familiar Windows host name but originates from a user's AppData folder rather than the System32 directory. Identify the filename used to masquerade the PowerShell binary on this Windows device. | ✗ | ✓ | grok-3-mini-beta Wins |
T1007 | An analyst suspects a user or script ran a service enumeration command on a Linux system. Review process events to find the service-listing invocation and specify the full command that was executed. | ✗ | ✓ | grok-3-mini-beta Wins |
Page 1 of 8
Explore individual model performance and detailed analysis