gpt-5-mini-high vs grok-3-beta KQL Benchmark
grok-3-beta wins by 0.5%
Compared on 188 shared test questions
Overall Accuracy
gpt-5-mini-high
48.4%
91 / 188 correct
grok-3-beta
48.9%
92 / 188 correct
Average Cost per Query
gpt-5-mini-high: $0.0150
grok-3-beta: $0.0642
grok-3-beta costs 328.5% more
Average Execution Time
gpt-5-mini-high: 44.83s
grok-3-beta: 16.92s
gpt-5-mini-high takes 165.0% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 188 questions
Page 1 of 8
T1007 | An analyst suspects a user or script ran a service enumeration command on a Linux system. Review process events to find the service-listing invocation and specify the full command that was executed. | ✓ | ✗ | gpt-5-mini-high Wins |
T1018 | On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache? | ✓ | ✗ | gpt-5-mini-high Wins |
T1021.006 | On Windows hosts, look through recent PowerShell execution records to find any elevated session where remote management was turned on. What exact command was run to enable PSRemoting? | ✓ | ✗ | gpt-5-mini-high Wins |
T1016.001 | An analyst notices a PowerShell process on a Windows host that appears to be checking SMB connectivity. Which PowerShell cmdlet was executed to perform this outbound port 445 test? | ✓ | ✗ | gpt-5-mini-high Wins |
T1018 | A Windows host executed an ICMP-based network reconnaissance using a looping instruction in cmd.exe. Identify the exact command line that was used to perform the ping sweep. | ✓ | ✗ | gpt-5-mini-high Wins |
T1053.006 | Examine the logs from the Linux system for events related to the systemd timer activation. Identify any records indicating that a new timer unit was started and enabled, and determine which timer name was used. | ✓ | ✗ | gpt-5-mini-high Wins |
T1057 | On a Windows host, investigate process events to find when Task Manager was launched via cmd with an unusual flag. What was the full command executed? | ✓ | ✗ | gpt-5-mini-high Wins |
T1069.001 | On a Linux endpoint, process events reveal a chain of group‐enumeration utilities executed by a single session. Which utility was used to query the system’s group database? | ✓ | ✗ | gpt-5-mini-high Wins |
T1070.003 | On a Windows device, there’s evidence that PowerShell history was wiped by deleting the history file. What was the exact command used to perform this action? | ✓ | ✗ | gpt-5-mini-high Wins |
T1082 | Using Linux process execution logs, identify the specific command that was used to filter loaded kernel modules for entries containing “vmw.” What was that full command? | ✓ | ✗ | gpt-5-mini-high Wins |
T1078.003 | Review the Linux process creation records to find which user account management utility was used to reactivate the previously locked and expired account. | ✓ | ✗ | gpt-5-mini-high Wins |
T1112 | A Windows host logs a change to the Terminal Server registry key disabling single-session per user. Which command-line utility executed this registry modification? | ✓ | ✗ | gpt-5-mini-high Wins |
T1112 | Review registry event logs on the Windows host for PowerShell-driven writes to system policy and file system keys. Which registry value names were created during this BlackByte preparation simulation? | ✓ | ✗ | gpt-5-mini-high Wins |
T1112 | A Windows user’s registry was altered via a command-line tool to disable the lock workstation feature by adding a DWORD entry under the current user Policies\System key. Which registry value name was modified in this operation? | ✓ | ✗ | gpt-5-mini-high Wins |
T1112 | Investigate Windows registry events to identify any newly set ProxyServer entry under the user Internet Settings hive. What proxy server address was configured? | ✓ | ✗ | gpt-5-mini-high Wins |
T1112 | On a Windows device, examine registry event logs for modifications under the System policies path. Determine which registry value name was altered to disable the shutdown button at login. | ✓ | ✗ | gpt-5-mini-high Wins |
T1124 | An analyst reviewing Windows process logs wants to spot instances where a native time tool was repurposed to introduce a delay. Which full W32tm invocation, including the stripchart and period flags, appears in the logs? | ✓ | ✗ | gpt-5-mini-high Wins |
T1120 | Review Windows process and PowerShell activity for commands that enumerate PnP entities through WMI. Which PowerShell cmdlet was invoked to perform this hardware inventory? | ✓ | ✗ | gpt-5-mini-high Wins |
T1134.001 | A Windows host logs show PowerShell fetching and executing a remote script to gain SeDebugPrivilege token duplication. Which Empire module was invoked? | ✓ | ✗ | gpt-5-mini-high Wins |
T1217 | An attacker leveraged a PowerShell command on a Windows host to enumerate browser bookmark files across all user profiles. Examine the process execution logs to determine the exact filename that was being searched for. | ✓ | ✗ | gpt-5-mini-high Wins |
T1217 | On a Windows system, you notice a process that recursively enumerates files named 'Bookmarks' under every user profile directory. Which Windows command-line utility was used to perform that search? | ✓ | ✗ | gpt-5-mini-high Wins |
T1490 | An attacker obtained elevated rights on a Windows system and ran a deletion command that attempted to remove various backup file types across the C: drive, generating numerous “access denied” errors. What was the full command line used? | ✓ | ✗ | gpt-5-mini-high Wins |
T1218.011 | You notice rundll32.exe being used with desk.cpl,InstallScreenSaver on a Windows endpoint. Investigate your process creation logs to find which .scr file was loaded by this unusual invocation. | ✓ | ✗ | gpt-5-mini-high Wins |
T1546.004 | On Linux systems, an attacker may gain persistence by appending instructions to the global shell profile. Investigate process or file modification events to find evidence of text being added to /etc/profile, and identify the exact command invocation that carried out this change. | ✓ | ✗ | gpt-5-mini-high Wins |
T1547.014 | Windows registry events show that a new key under the Active Setup Installed Components branch was added to launch a payload immediately via runonce.exe. Which component name was created? | ✓ | ✗ | gpt-5-mini-high Wins |
Page 1 of 8
Explore individual model performance and detailed analysis