gpt-4-turbo-2024-04-09 vs grok-3-mini-beta KQL Benchmark

grok-3-mini-beta wins by 19.1%

Compared on 188 shared test questions

Overall Accuracy

gpt-4-turbo-2024-04-09

39.4%

74 / 188 correct

grok-3-mini-beta

58.5%

110 / 188 correct

Average Cost per Query

gpt-4-turbo-2024-04-09: $0.1737
grok-3-mini-beta: $0.0040
gpt-4-turbo-2024-04-09 costs 4281.4% more

Average Execution Time

gpt-4-turbo-2024-04-09: 16.84s
grok-3-mini-beta: 22.38s
grok-3-mini-beta takes 32.9% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1016.001
On a Linux host, a ping command was executed to test internet connectivity. Determine which IP address was used as the ping target.
gpt-4-turbo-2024-04-09 Wins
T1016
A Linux host’s Syslog shows a shell-based network discovery script ran multiple commands. One of them listed current TCP connections. Which utility was invoked?
gpt-4-turbo-2024-04-09 Wins
T1036.003
In a Linux environment, you observe a process labeled like the cron daemon but running from an unexpected path. Investigate creation events to uncover the actual filename used by this fake cron process.
gpt-4-turbo-2024-04-09 Wins
T1036.004
A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked?
gpt-4-turbo-2024-04-09 Wins
T1039
On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action?
gpt-4-turbo-2024-04-09 Wins
T1059.004
An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell.
gpt-4-turbo-2024-04-09 Wins
T1059.004
On a Linux system, review process execution records for any shell process that set an environment variable containing executable code and then piped it into another shell instance. Determine which environment variable name was used to store the script content.
gpt-4-turbo-2024-04-09 Wins
T1070.004
A Linux host executed a native utility to overwrite and then remove a temporary file in one step. Identify the name of the file that was securely deleted by this action.
gpt-4-turbo-2024-04-09 Wins
T1112
On Windows systems, disabling RDP via the registry generates registry write events. Investigate registry event logs for modifications under the Terminal Server configuration path. What is the name of the registry value that was changed to disable Remote Desktop Protocol?
gpt-4-turbo-2024-04-09 Wins
T1124
On a Linux host, an activity was recorded where the local clock and timezone were queried. Review the available process execution logs to uncover what full command was run to fetch the system time and timezone.
gpt-4-turbo-2024-04-09 Wins
T1542.001
Investigate Windows file creation logs to uncover any new executable added directly to the System32 directory, which may indicate a UEFI persistence implant. What was the name of the file created?
gpt-4-turbo-2024-04-09 Wins
T1531
Within Windows process event logs, identify instances where the built-in net.exe utility is used to change a user account password. What was the new password argument passed in?
gpt-4-turbo-2024-04-09 Wins
T1548.001
Investigate Linux process or syslog records to find any invocation of the 'find' utility used to scan /usr/bin for files with the setuid bit. What was the full command executed?
gpt-4-turbo-2024-04-09 Wins
T1560
Windows system logs show PowerShell zipping up the contents of a user’s profile folder. Investigate process and file events to determine the exact name of the ZIP archive created.
gpt-4-turbo-2024-04-09 Wins
T1007
An analyst suspects a user or script ran a service enumeration command on a Linux system. Review process events to find the service-listing invocation and specify the full command that was executed.
grok-3-mini-beta Wins
T1006
Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path.
grok-3-mini-beta Wins
T1027
A Windows host shows a process launch with an extremely obfuscated command line that dynamically builds and invokes code at runtime. Which process name was used to execute this payload?
grok-3-mini-beta Wins
T1036.003
A process is running under a familiar Windows host name but originates from a user's AppData folder rather than the System32 directory. Identify the filename used to masquerade the PowerShell binary on this Windows device.
grok-3-mini-beta Wins
T1048.003
A Linux host briefly hosted an HTTP service under /tmp. Examine process creation logs to determine the exact python3 command that was used to start the server on port 9090.
grok-3-mini-beta Wins
T1053.003
Linux hosts may log events when new files are added to /var/spool/cron/crontabs. Query those logs for a creation or write action in that directory and determine the file name that was added.
grok-3-mini-beta Wins
T1053.006
Examine the logs from the Linux system for events related to the systemd timer activation. Identify any records indicating that a new timer unit was started and enabled, and determine which timer name was used.
grok-3-mini-beta Wins
T1057
A malicious actor may attempt to list running processes on a Windows machine using a WMI-based command. Review the process creation events to find out which utility was invoked to perform this enumeration.
grok-3-mini-beta Wins
T1059.004
An attacker on a Linux host may try to enumerate installed shells by reading the system file that lists valid shells. Using process or syslog data, determine which command was executed to perform this enumeration.
grok-3-mini-beta Wins
T1057
On a Windows device, review the process execution logs to find instances where a built-in listing tool was piped into a string filter. Identify the process name that the attacker was searching for.
grok-3-mini-beta Wins
T1059.004
On a Linux system, analyze the process logs for suspicious command line activity that includes a sequence of commands indicating a pipe-to-shell operation. Identify the tool that was used to execute this piped command, paying special attention to its use in downloading and running script content.
grok-3-mini-beta Wins
Page 1 of 8

Explore individual model performance and detailed analysis