gpt-4-turbo-2024-04-09 vs grok-3-beta KQL Benchmark
grok-3-beta wins by 9.6%
Compared on 188 shared test questions
Overall Accuracy
gpt-4-turbo-2024-04-09
39.4%
74 / 188 correct
grok-3-beta
48.9%
92 / 188 correct
Average Cost per Query
gpt-4-turbo-2024-04-09: $0.1737
grok-3-beta: $0.0642
gpt-4-turbo-2024-04-09 costs 170.6% more
Average Execution Time
gpt-4-turbo-2024-04-09: 16.84s
grok-3-beta: 16.92s
grok-3-beta takes 0.5% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 188 questions
Page 1 of 8
T1018 | A Windows host executed an ICMP-based network reconnaissance using a looping instruction in cmd.exe. Identify the exact command line that was used to perform the ping sweep. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1018 | On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1048.003 | Windows process creation logs show a PowerShell-driven file transfer to an FTP service. Which executable was leveraged to perform this exfiltration? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1057 | On a Windows host, investigate process events to find when Task Manager was launched via cmd with an unusual flag. What was the full command executed? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1059.004 | An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1059.004 | On a Linux system, review process execution records for any shell process that set an environment variable containing executable code and then piped it into another shell instance. Determine which environment variable name was used to store the script content. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1070.003 | On a Linux endpoint, you suspect malicious clearing of the bash history by redirecting from the null device. Explore process or file events to uncover the exact shell command that performed this action. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1070.005 | On a Windows system, an attacker used the command prompt to remove one or more default administrative shares. Which share names were deleted? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1070.004 | Suspiciously, the recycle bin appears empty system-wide. Determine which command was executed on Windows to clear the system's recycle bin directory, including any switches and environment variables. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1078.003 | Review the Linux process creation records to find which user account management utility was used to reactivate the previously locked and expired account. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1112 | A Windows host logs a change to the Terminal Server registry key disabling single-session per user. Which command-line utility executed this registry modification? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1112 | Investigate Windows registry events to identify any newly set ProxyServer entry under the user Internet Settings hive. What proxy server address was configured? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1124 | An analyst reviewing Windows process logs wants to spot instances where a native time tool was repurposed to introduce a delay. Which full W32tm invocation, including the stripchart and period flags, appears in the logs? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1218.010 | An attacker has attempted to sideload code by invoking regsvr32.exe in a Windows host against a file that does not use the standard .dll extension. Investigate the process event logs to determine the name of the file that was registered. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1560 | Windows system logs show PowerShell zipping up the contents of a user’s profile folder. Investigate process and file events to determine the exact name of the ZIP archive created. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1562.003 | Review Windows registry event logs for the ProcessCreationIncludeCmdLine_Enabled value being set to 0. Which PowerShell cmdlet performed this change? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1562.003 | During a Linux engagement, an operator exported HISTSIZE=0 to turn off shell history and then ran another utility. Using the DeviceProcessEvents table, identify the exact command executed immediately after the history was disabled. What was the command? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1562.012 | On a Linux host, auditing has been turned off. Review process execution or syslog data to determine which command was executed to disable the audit subsystem. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1614.001 | Using Linux process or syslog logs, identify the executable that was run to output the system's locale information. | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1571 | On a Windows system, identify any PowerShell Test-NetConnection executions against an uncommon port. Which port number was checked? | ✓ | ✗ | gpt-4-turbo-2024-04-09 Wins |
T1006 | Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path. | ✗ | ✓ | grok-3-beta Wins |
T1003.001 | Using Windows process event logs, investigate PowerShell activity around lsass.exe memory capture. What was the name of the script file invoked to perform the dump? | ✗ | ✓ | grok-3-beta Wins |
T1018 | Review Linux process execution records for any commands that list TCP metric cache entries and filter out loopback interfaces. Which utility was used? | ✗ | ✓ | grok-3-beta Wins |
T1027 | A Windows host shows a process launch with an extremely obfuscated command line that dynamically builds and invokes code at runtime. Which process name was used to execute this payload? | ✗ | ✓ | grok-3-beta Wins |
T1036.004 | Analyze Windows process events for any schtasks.exe commands that created a new task invoking PowerShell. What is the name of the .ps1 script specified to run? | ✗ | ✓ | grok-3-beta Wins |
Page 1 of 8
Explore individual model performance and detailed analysis