grok-3-beta vs o3-mini-low KQL Benchmark

o3-mini-low wins by 2.7%

Compared on 188 shared test questions

Overall Accuracy

grok-3-beta

48.9%

92 / 188 correct

o3-mini-low

51.6%

97 / 188 correct

Average Cost per Query

grok-3-beta: $0.0642
o3-mini-low: $0.0279
grok-3-beta costs 130.3% more

Average Execution Time

grok-3-beta: 16.92s
o3-mini-low: 36.54s
o3-mini-low takes 116.0% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1003.001
Using Windows process event logs, investigate PowerShell activity around lsass.exe memory capture. What was the name of the script file invoked to perform the dump?
grok-3-beta Wins
T1003
On a Windows host, an attacker leveraged COMSVCS.DLL via rundll32.exe to dump the memory of svchost.exe into a file under the Temp directory. Review the file creation logs to determine the exact name of the dump file that was generated.
grok-3-beta Wins
T1006
Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path.
grok-3-beta Wins
T1018
Review Linux process execution records for any commands that list TCP metric cache entries and filter out loopback interfaces. Which utility was used?
grok-3-beta Wins
T1039
On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action?
grok-3-beta Wins
T1046
A reconnaissance tool was executed on a Windows system. Identify the specific function of the tool that was executed. The function has a name from something you can eat
grok-3-beta Wins
T1053.005
On a Windows host, find any scheduled task that was registered using PowerShell native cmdlets instead of schtasks.exe. What was the name given to the new task?
grok-3-beta Wins
T1036.004
A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked?
grok-3-beta Wins
T1057
While reviewing Windows process events, you spot a PowerShell process executing a WMI enumeration cmdlet. What WMI class name did the attacker query?
grok-3-beta Wins
T1059.004
During a Linux investigation, you notice processes spawning curl and wget commands that pull a script from a remote GitHub raw URL and pipe it into bash. Identify the name of the script that was retrieved and executed.
grok-3-beta Wins
T1070.006
On a Windows host, suspicious PowerShell activity adjusted the system clock and recorded a value. What numeric value was used to slip the system date?
grok-3-beta Wins
T1070.004
While reviewing Windows process events, you observe a command that recursively deleted a folder under the temporary directory. Use the process event data to identify which process or tool executed this recursive delete.
grok-3-beta Wins
T1120
Review Windows process execution logs to find any native utility that was used to enumerate connected drives. Which utility was invoked?
grok-3-beta Wins
T1217
An attacker is suspected of using the Windows shell to enumerate a user’s Internet Explorer bookmarks via the Favorites folder. Identify the exact command they executed to perform this listing.
grok-3-beta Wins
T1201
You are reviewing Linux syslog records on a CentOS/RHEL 7.x server. You notice entries for shell commands that access system configuration files under /etc/security. Determine exactly which configuration file was being inspected by the command.
grok-3-beta Wins
T1497.003
On a Linux host, identify any processes that used ping with a large count value to introduce a delay before launching another process. What was the command executed immediately after the ping delay?
grok-3-beta Wins
T1542.001
Investigate Windows file creation logs to uncover any new executable added directly to the System32 directory, which may indicate a UEFI persistence implant. What was the name of the file created?
grok-3-beta Wins
T1546.004
On Linux, review file events for changes in the system-wide shell profile directory. Determine the name of the script file in /etc/profile.d that shows evidence of an unauthorized append.
grok-3-beta Wins
T1547.014
A Windows endpoint shows an Active Setup entry under Internet Explorer Core Fonts being altered with a StubPath value. Investigate the registry events and identify the payload that was set.
grok-3-beta Wins
T1546.013
On a Windows endpoint, review any events showing content being appended to a user’s PowerShell profile that introduce new process launches. What exact command line was added?
grok-3-beta Wins
T1548.001
A Linux system shows a shell invocation that appears to be searching for files with elevated group permissions. Using the available process execution logs, determine exactly what command was run.
grok-3-beta Wins
T1555
A security investigator suspects that someone attempted to dump stored web credentials on a Windows system using an in-built command-line tool. Review process creation logs to determine which executable was called to list the Web Credentials vault.
grok-3-beta Wins
T1555
An endpoint shows a PowerShell process that downloaded and executed a remote script aimed at extracting credentials from the Windows Credential Manager. Review the process creation logs and identify the function name that was invoked to dump the web credentials.
grok-3-beta Wins
T1564.002
On Windows systems, identify any user account that was hidden by setting its value to 0 under the SpecialAccounts\\UserList registry key. What was the name of the hidden account?
grok-3-beta Wins
T1018
On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache?
o3-mini-low Wins
Page 1 of 8

Explore individual model performance and detailed analysis