gpt-5-nano-high vs grok-3-mini-beta KQL Benchmark

grok-3-mini-beta wins by 28.2%

Compared on 188 shared test questions

Overall Accuracy

gpt-5-nano-high

30.3%

57 / 188 correct

grok-3-mini-beta

58.5%

110 / 188 correct

Average Cost per Query

gpt-5-nano-high: $0.0069
grok-3-mini-beta: $0.0040
gpt-5-nano-high costs 73.8% more

Average Execution Time

gpt-5-nano-high: 61.10s
grok-3-mini-beta: 22.38s
gpt-5-nano-high takes 173.0% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1018
Review Linux process execution records for any commands that list TCP metric cache entries and filter out loopback interfaces. Which utility was used?
gpt-5-nano-high Wins
T1016.001
On a Linux host, a ping command was executed to test internet connectivity. Determine which IP address was used as the ping target.
gpt-5-nano-high Wins
T1036.003
In a Linux environment, you observe a process labeled like the cron daemon but running from an unexpected path. Investigate creation events to uncover the actual filename used by this fake cron process.
gpt-5-nano-high Wins
T1059.004
During a Linux investigation, you notice processes spawning curl and wget commands that pull a script from a remote GitHub raw URL and pipe it into bash. Identify the name of the script that was retrieved and executed.
gpt-5-nano-high Wins
T1059.004
An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell.
gpt-5-nano-high Wins
T1070.004
A Linux host executed a native utility to overwrite and then remove a temporary file in one step. Identify the name of the file that was securely deleted by this action.
gpt-5-nano-high Wins
T1134.001
A Windows host logs show PowerShell fetching and executing a remote script to gain SeDebugPrivilege token duplication. Which Empire module was invoked?
gpt-5-nano-high Wins
T1548.001
A Linux system shows a shell invocation that appears to be searching for files with elevated group permissions. Using the available process execution logs, determine exactly what command was run.
gpt-5-nano-high Wins
T1548.001
Investigate Linux process or syslog records to find any invocation of the 'find' utility used to scan /usr/bin for files with the setuid bit. What was the full command executed?
gpt-5-nano-high Wins
T1555
A security investigator suspects that someone attempted to dump stored web credentials on a Windows system using an in-built command-line tool. Review process creation logs to determine which executable was called to list the Web Credentials vault.
gpt-5-nano-high Wins
T1559
Investigating a Windows device, you suspect a non-standard executable was launched to set up a named pipe for client-server messaging. Determine the name of the executable that was run.
gpt-5-nano-high Wins
T1007
An analyst suspects a user or script ran a service enumeration command on a Linux system. Review process events to find the service-listing invocation and specify the full command that was executed.
grok-3-mini-beta Wins
T1036.003
A process is running under a familiar Windows host name but originates from a user's AppData folder rather than the System32 directory. Identify the filename used to masquerade the PowerShell binary on this Windows device.
grok-3-mini-beta Wins
T1006
Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path.
grok-3-mini-beta Wins
T1027
A Windows host shows a process launch with an extremely obfuscated command line that dynamically builds and invokes code at runtime. Which process name was used to execute this payload?
grok-3-mini-beta Wins
T1053.003
Linux hosts may log events when new files are added to /var/spool/cron/crontabs. Query those logs for a creation or write action in that directory and determine the file name that was added.
grok-3-mini-beta Wins
T1048.003
A Linux host briefly hosted an HTTP service under /tmp. Examine process creation logs to determine the exact python3 command that was used to start the server on port 9090.
grok-3-mini-beta Wins
T1049
In a Windows log analytics workspace, search for PowerShell processes that were used to enumerate network connections. Determine which PowerShell cmdlet was executed to list active TCP connections.
grok-3-mini-beta Wins
T1048.003
Windows process creation logs show a PowerShell-driven file transfer to an FTP service. Which executable was leveraged to perform this exfiltration?
grok-3-mini-beta Wins
T1053.006
Examine the logs from the Linux system for events related to the systemd timer activation. Identify any records indicating that a new timer unit was started and enabled, and determine which timer name was used.
grok-3-mini-beta Wins
T1057
A malicious actor may attempt to list running processes on a Windows machine using a WMI-based command. Review the process creation events to find out which utility was invoked to perform this enumeration.
grok-3-mini-beta Wins
T1059.004
An attacker on a Linux host may try to enumerate installed shells by reading the system file that lists valid shells. Using process or syslog data, determine which command was executed to perform this enumeration.
grok-3-mini-beta Wins
T1057
On a Windows device, review the process execution logs to find instances where a built-in listing tool was piped into a string filter. Identify the process name that the attacker was searching for.
grok-3-mini-beta Wins
T1059.004
On a Linux host, identify the process invocation that altered a user’s login shell. What was the full command used?
grok-3-mini-beta Wins
T1059.004
On a Linux system, find any process creation record where awk is used with a BEGIN rule to launch a shell. What was the exact command invoked?
grok-3-mini-beta Wins
Page 1 of 8

Explore individual model performance and detailed analysis