grok-3-beta vs o3-mini-high KQL Benchmark

o3-mini-high wins by 2.7%

Compared on 188 shared test questions

Overall Accuracy

grok-3-beta

48.9%

92 / 188 correct

o3-mini-high

51.6%

97 / 188 correct

Average Cost per Query

grok-3-beta: $0.0642
o3-mini-high: $0.0262
grok-3-beta costs 145.2% more

Average Execution Time

grok-3-beta: 16.92s
o3-mini-high: 33.17s
o3-mini-high takes 96.0% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1003.001
Using Windows process event logs, investigate PowerShell activity around lsass.exe memory capture. What was the name of the script file invoked to perform the dump?
grok-3-beta Wins
T1003
On a Windows host, an attacker leveraged COMSVCS.DLL via rundll32.exe to dump the memory of svchost.exe into a file under the Temp directory. Review the file creation logs to determine the exact name of the dump file that was generated.
grok-3-beta Wins
T1018
Review Linux process execution records for any commands that list TCP metric cache entries and filter out loopback interfaces. Which utility was used?
grok-3-beta Wins
T1027
On a Windows endpoint, look for evidence of a base64-encoded PowerShell payload execution. Which executable launched the encoded command?
grok-3-beta Wins
T1039
On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action?
grok-3-beta Wins
T1046
A reconnaissance tool was executed on a Windows system. Identify the specific function of the tool that was executed. The function has a name from something you can eat
grok-3-beta Wins
T1053.005
On a Windows host, find any scheduled task that was registered using PowerShell native cmdlets instead of schtasks.exe. What was the name given to the new task?
grok-3-beta Wins
T1057
On a Windows device, PowerShell was used to collect a snapshot of running processes. Identify the exact cmdlet that was executed.
grok-3-beta Wins
T1036.004
A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked?
grok-3-beta Wins
T1069.001
Review recent Windows process event logs for PowerShell activity that suggests local group enumeration through WMI. What exact command was executed?
grok-3-beta Wins
T1057
While reviewing Windows process events, you spot a PowerShell process executing a WMI enumeration cmdlet. What WMI class name did the attacker query?
grok-3-beta Wins
T1070.006
On a Windows host, suspicious PowerShell activity adjusted the system clock and recorded a value. What numeric value was used to slip the system date?
grok-3-beta Wins
T1070.004
While reviewing Windows process events, you observe a command that recursively deleted a folder under the temporary directory. Use the process event data to identify which process or tool executed this recursive delete.
grok-3-beta Wins
T1082
A user‐space process on a Linux device invoked a shell to capture and display the system’s environment variables and path. Which exact command was used to perform this discovery?
grok-3-beta Wins
T1120
Review Windows process execution logs to find any native utility that was used to enumerate connected drives. Which utility was invoked?
grok-3-beta Wins
T1124
A Windows host recorded a process that simply executes the system’s native time utility. Without spelling out the query, determine which command was run based on process creation events.
grok-3-beta Wins
T1217
An attacker is suspected of using the Windows shell to enumerate a user’s Internet Explorer bookmarks via the Favorites folder. Identify the exact command they executed to perform this listing.
grok-3-beta Wins
T1217
On Linux, review the process execution logs to uncover when Chromium’s bookmark JSON files were being located and the results persisted. Focus on shell commands that search under .config/chromium and write output to a file. What was the filename used to save the findings?
grok-3-beta Wins
T1546.004
On Linux, review file events for changes in the system-wide shell profile directory. Determine the name of the script file in /etc/profile.d that shows evidence of an unauthorized append.
grok-3-beta Wins
T1547.014
A Windows endpoint shows an Active Setup entry under Internet Explorer Core Fonts being altered with a StubPath value. Investigate the registry events and identify the payload that was set.
grok-3-beta Wins
T1546.013
On a Windows endpoint, review any events showing content being appended to a user’s PowerShell profile that introduce new process launches. What exact command line was added?
grok-3-beta Wins
T1555.003
On a Windows system, PowerShell was used to gather multiple browser credential files into a temp folder and then archive them. What was the name of the resulting ZIP file?
grok-3-beta Wins
T1555
An endpoint shows a PowerShell process that downloaded and executed a remote script aimed at extracting credentials from the Windows Credential Manager. Review the process creation logs and identify the function name that was invoked to dump the web credentials.
grok-3-beta Wins
T1622
On the Windows device, a security check was run to detect debugger processes via PowerShell. Which tool (process) carried out this check?
grok-3-beta Wins
T1018
On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache?
o3-mini-high Wins
Page 1 of 8

Explore individual model performance and detailed analysis