gpt-4-turbo-2024-04-09 vs grok-3-beta KQL Benchmark

grok-3-beta wins by 9.6%

Compared on 188 shared test questions

Overall Accuracy

gpt-4-turbo-2024-04-09

39.4%

74 / 188 correct

grok-3-beta

48.9%

92 / 188 correct

Average Cost per Query

gpt-4-turbo-2024-04-09: $0.1737
grok-3-beta: $0.0642
gpt-4-turbo-2024-04-09 costs 170.6% more

Average Execution Time

gpt-4-turbo-2024-04-09: 16.84s
grok-3-beta: 16.92s
grok-3-beta takes 0.5% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1018
A Windows host executed an ICMP-based network reconnaissance using a looping instruction in cmd.exe. Identify the exact command line that was used to perform the ping sweep.
gpt-4-turbo-2024-04-09 Wins
T1018
On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache?
gpt-4-turbo-2024-04-09 Wins
T1048.003
Windows process creation logs show a PowerShell-driven file transfer to an FTP service. Which executable was leveraged to perform this exfiltration?
gpt-4-turbo-2024-04-09 Wins
T1057
On a Windows host, investigate process events to find when Task Manager was launched via cmd with an unusual flag. What was the full command executed?
gpt-4-turbo-2024-04-09 Wins
T1059.004
An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell.
gpt-4-turbo-2024-04-09 Wins
T1059.004
On a Linux system, review process execution records for any shell process that set an environment variable containing executable code and then piped it into another shell instance. Determine which environment variable name was used to store the script content.
gpt-4-turbo-2024-04-09 Wins
T1070.003
On a Linux endpoint, you suspect malicious clearing of the bash history by redirecting from the null device. Explore process or file events to uncover the exact shell command that performed this action.
gpt-4-turbo-2024-04-09 Wins
T1070.005
On a Windows system, an attacker used the command prompt to remove one or more default administrative shares. Which share names were deleted?
gpt-4-turbo-2024-04-09 Wins
T1070.004
Suspiciously, the recycle bin appears empty system-wide. Determine which command was executed on Windows to clear the system's recycle bin directory, including any switches and environment variables.
gpt-4-turbo-2024-04-09 Wins
T1078.003
Review the Linux process creation records to find which user account management utility was used to reactivate the previously locked and expired account.
gpt-4-turbo-2024-04-09 Wins
T1112
A Windows host logs a change to the Terminal Server registry key disabling single-session per user. Which command-line utility executed this registry modification?
gpt-4-turbo-2024-04-09 Wins
T1112
Investigate Windows registry events to identify any newly set ProxyServer entry under the user Internet Settings hive. What proxy server address was configured?
gpt-4-turbo-2024-04-09 Wins
T1124
An analyst reviewing Windows process logs wants to spot instances where a native time tool was repurposed to introduce a delay. Which full W32tm invocation, including the stripchart and period flags, appears in the logs?
gpt-4-turbo-2024-04-09 Wins
T1218.010
An attacker has attempted to sideload code by invoking regsvr32.exe in a Windows host against a file that does not use the standard .dll extension. Investigate the process event logs to determine the name of the file that was registered.
gpt-4-turbo-2024-04-09 Wins
T1560
Windows system logs show PowerShell zipping up the contents of a user’s profile folder. Investigate process and file events to determine the exact name of the ZIP archive created.
gpt-4-turbo-2024-04-09 Wins
T1562.003
Review Windows registry event logs for the ProcessCreationIncludeCmdLine_Enabled value being set to 0. Which PowerShell cmdlet performed this change?
gpt-4-turbo-2024-04-09 Wins
T1562.003
During a Linux engagement, an operator exported HISTSIZE=0 to turn off shell history and then ran another utility. Using the DeviceProcessEvents table, identify the exact command executed immediately after the history was disabled. What was the command?
gpt-4-turbo-2024-04-09 Wins
T1562.012
On a Linux host, auditing has been turned off. Review process execution or syslog data to determine which command was executed to disable the audit subsystem.
gpt-4-turbo-2024-04-09 Wins
T1614.001
Using Linux process or syslog logs, identify the executable that was run to output the system's locale information.
gpt-4-turbo-2024-04-09 Wins
T1571
On a Windows system, identify any PowerShell Test-NetConnection executions against an uncommon port. Which port number was checked?
gpt-4-turbo-2024-04-09 Wins
T1006
Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path.
grok-3-beta Wins
T1003.001
Using Windows process event logs, investigate PowerShell activity around lsass.exe memory capture. What was the name of the script file invoked to perform the dump?
grok-3-beta Wins
T1018
Review Linux process execution records for any commands that list TCP metric cache entries and filter out loopback interfaces. Which utility was used?
grok-3-beta Wins
T1027
A Windows host shows a process launch with an extremely obfuscated command line that dynamically builds and invokes code at runtime. Which process name was used to execute this payload?
grok-3-beta Wins
T1036.004
Analyze Windows process events for any schtasks.exe commands that created a new task invoking PowerShell. What is the name of the .ps1 script specified to run?
grok-3-beta Wins
Page 1 of 8

Explore individual model performance and detailed analysis