gpt-5-mini-medium vs grok-3-beta KQL Benchmark

grok-3-beta wins by 3.8%

Compared on 187 shared test questions

Overall Accuracy

gpt-5-mini-medium

45.5%

85 / 187 correct

grok-3-beta

49.2%

92 / 187 correct

Average Cost per Query

gpt-5-mini-medium: $0.0150
grok-3-beta: $0.0642
grok-3-beta costs 327.3% more

Average Execution Time

gpt-5-mini-medium: 47.16s
grok-3-beta: 16.92s
gpt-5-mini-medium takes 178.7% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 187 questions
Page 1 of 8
T1018
A Windows host executed an ICMP-based network reconnaissance using a looping instruction in cmd.exe. Identify the exact command line that was used to perform the ping sweep.
gpt-5-mini-medium Wins
T1048.003
Windows process creation logs show a PowerShell-driven file transfer to an FTP service. Which executable was leveraged to perform this exfiltration?
gpt-5-mini-medium Wins
T1057
On a Windows host, investigate process events to find when Task Manager was launched via cmd with an unusual flag. What was the full command executed?
gpt-5-mini-medium Wins
T1053.005
Investigate Windows process events for PowerShell activity that leverages WMI to register a scheduled task via XML import. What was the name of the XML file supplied to the RegisterByXml method?
gpt-5-mini-medium Wins
T1070.005
On a Windows system, an attacker used the command prompt to remove one or more default administrative shares. Which share names were deleted?
gpt-5-mini-medium Wins
T1070.004
Suspiciously, the recycle bin appears empty system-wide. Determine which command was executed on Windows to clear the system's recycle bin directory, including any switches and environment variables.
gpt-5-mini-medium Wins
T1069.001
Investigate Windows process execution logs for a PowerShell cmdlet used to list group members. Look for entries where a group name is provided after a '-Name' flag and identify which group was queried.
gpt-5-mini-medium Wins
T1070.003
On a Linux endpoint, you suspect malicious clearing of the bash history by redirecting from the null device. Explore process or file events to uncover the exact shell command that performed this action.
gpt-5-mini-medium Wins
T1082
Using Linux process execution logs, identify the specific command that was used to filter loaded kernel modules for entries containing “vmw.” What was that full command?
gpt-5-mini-medium Wins
T1059.004
An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell.
gpt-5-mini-medium Wins
T1082
A Linux host was used to collect various system release files and kernel details, writing them into a single file under /tmp. What was the name of that output file?
gpt-5-mini-medium Wins
T1112
A Windows host logs a change to the Terminal Server registry key disabling single-session per user. Which command-line utility executed this registry modification?
gpt-5-mini-medium Wins
T1112
On a Windows device, examine registry event logs for modifications under the System policies path. Determine which registry value name was altered to disable the shutdown button at login.
gpt-5-mini-medium Wins
T1124
An analyst reviewing Windows process logs wants to spot instances where a native time tool was repurposed to introduce a delay. Which full W32tm invocation, including the stripchart and period flags, appears in the logs?
gpt-5-mini-medium Wins
T1120
Review Windows process and PowerShell activity for commands that enumerate PnP entities through WMI. Which PowerShell cmdlet was invoked to perform this hardware inventory?
gpt-5-mini-medium Wins
T1201
On a Linux system, logs show that the password expiration settings file was accessed. Identify which command was executed to list its contents.
gpt-5-mini-medium Wins
T1546.003
On a Windows endpoint, an attacker ran a PowerShell sequence to establish a WMI event subscription using CommandLineEventConsumer. Inspect the process or script execution logs to uncover which executable was set to run by this subscription.
gpt-5-mini-medium Wins
T1547
A Windows host shows a process launching with install-driver switches, likely signaling malicious driver deployment. What is the name of the tool that was executed?
gpt-5-mini-medium Wins
T1217
On a Windows system, you notice a process that recursively enumerates files named 'Bookmarks' under every user profile directory. Which Windows command-line utility was used to perform that search?
gpt-5-mini-medium Wins
T1546.004
A suspicious file modification on a Linux device targeted the ~/.bash_profile file, apparently adding a new line. What was the full command string that was appended?
gpt-5-mini-medium Wins
T1547.002
A Windows host shows a suspicious registry change under the LSA hive. Review recent registry events to locate any new entries under Authentication Packages and determine the name of the DLL the attacker added.
gpt-5-mini-medium Wins
T1548.002
On a Windows host, sift through registry modification events targeting HKLM\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Policies\\System\\ConsentPromptBehaviorAdmin. What new value was written to disable the admin consent prompt?
gpt-5-mini-medium Wins
T1562.003
Review Windows registry event logs for the ProcessCreationIncludeCmdLine_Enabled value being set to 0. Which PowerShell cmdlet performed this change?
gpt-5-mini-medium Wins
T1562.006
A .NET tracing environment variable was turned off in a user’s registry on a Windows system. Which built-in command-line tool was used to make this registry change?
gpt-5-mini-medium Wins
T1557.001
On Windows devices, hunt for PowerShell activity where a remote script is fetched and executed to perform LLMNR/NBNS spoofing. Which cmdlet kicked off the listener?
gpt-5-mini-medium Wins
Page 1 of 8

Explore individual model performance and detailed analysis