grok-3-beta vs o3-mini-high KQL Benchmark
o3-mini-high wins by 2.7%
Compared on 188 shared test questions
Overall Accuracy
grok-3-beta
48.9%
92 / 188 correct
o3-mini-high
51.6%
97 / 188 correct
Average Cost per Query
grok-3-beta: $0.0642
o3-mini-high: $0.0262
grok-3-beta costs 145.2% more
Average Execution Time
grok-3-beta: 16.92s
o3-mini-high: 33.17s
o3-mini-high takes 96.0% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 188 questions
Page 1 of 8
T1003.001 | Using Windows process event logs, investigate PowerShell activity around lsass.exe memory capture. What was the name of the script file invoked to perform the dump? | ✓ | ✗ | grok-3-beta Wins |
T1003 | On a Windows host, an attacker leveraged COMSVCS.DLL via rundll32.exe to dump the memory of svchost.exe into a file under the Temp directory. Review the file creation logs to determine the exact name of the dump file that was generated. | ✓ | ✗ | grok-3-beta Wins |
T1018 | Review Linux process execution records for any commands that list TCP metric cache entries and filter out loopback interfaces. Which utility was used? | ✓ | ✗ | grok-3-beta Wins |
T1027 | On a Windows endpoint, look for evidence of a base64-encoded PowerShell payload execution. Which executable launched the encoded command? | ✓ | ✗ | grok-3-beta Wins |
T1039 | On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action? | ✓ | ✗ | grok-3-beta Wins |
T1046 | A reconnaissance tool was executed on a Windows system. Identify the specific function of the tool that was executed. The function has a name from something you can eat | ✓ | ✗ | grok-3-beta Wins |
T1053.005 | On a Windows host, find any scheduled task that was registered using PowerShell native cmdlets instead of schtasks.exe. What was the name given to the new task? | ✓ | ✗ | grok-3-beta Wins |
T1057 | On a Windows device, PowerShell was used to collect a snapshot of running processes. Identify the exact cmdlet that was executed. | ✓ | ✗ | grok-3-beta Wins |
T1036.004 | A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked? | ✓ | ✗ | grok-3-beta Wins |
T1069.001 | Review recent Windows process event logs for PowerShell activity that suggests local group enumeration through WMI. What exact command was executed? | ✓ | ✗ | grok-3-beta Wins |
T1057 | While reviewing Windows process events, you spot a PowerShell process executing a WMI enumeration cmdlet. What WMI class name did the attacker query? | ✓ | ✗ | grok-3-beta Wins |
T1070.006 | On a Windows host, suspicious PowerShell activity adjusted the system clock and recorded a value. What numeric value was used to slip the system date? | ✓ | ✗ | grok-3-beta Wins |
T1070.004 | While reviewing Windows process events, you observe a command that recursively deleted a folder under the temporary directory. Use the process event data to identify which process or tool executed this recursive delete. | ✓ | ✗ | grok-3-beta Wins |
T1082 | A user‐space process on a Linux device invoked a shell to capture and display the system’s environment variables and path. Which exact command was used to perform this discovery? | ✓ | ✗ | grok-3-beta Wins |
T1120 | Review Windows process execution logs to find any native utility that was used to enumerate connected drives. Which utility was invoked? | ✓ | ✗ | grok-3-beta Wins |
T1124 | A Windows host recorded a process that simply executes the system’s native time utility. Without spelling out the query, determine which command was run based on process creation events. | ✓ | ✗ | grok-3-beta Wins |
T1217 | An attacker is suspected of using the Windows shell to enumerate a user’s Internet Explorer bookmarks via the Favorites folder. Identify the exact command they executed to perform this listing. | ✓ | ✗ | grok-3-beta Wins |
T1217 | On Linux, review the process execution logs to uncover when Chromium’s bookmark JSON files were being located and the results persisted. Focus on shell commands that search under .config/chromium and write output to a file. What was the filename used to save the findings? | ✓ | ✗ | grok-3-beta Wins |
T1546.004 | On Linux, review file events for changes in the system-wide shell profile directory. Determine the name of the script file in /etc/profile.d that shows evidence of an unauthorized append. | ✓ | ✗ | grok-3-beta Wins |
T1547.014 | A Windows endpoint shows an Active Setup entry under Internet Explorer Core Fonts being altered with a StubPath value. Investigate the registry events and identify the payload that was set. | ✓ | ✗ | grok-3-beta Wins |
T1546.013 | On a Windows endpoint, review any events showing content being appended to a user’s PowerShell profile that introduce new process launches. What exact command line was added? | ✓ | ✗ | grok-3-beta Wins |
T1555.003 | On a Windows system, PowerShell was used to gather multiple browser credential files into a temp folder and then archive them. What was the name of the resulting ZIP file? | ✓ | ✗ | grok-3-beta Wins |
T1555 | An endpoint shows a PowerShell process that downloaded and executed a remote script aimed at extracting credentials from the Windows Credential Manager. Review the process creation logs and identify the function name that was invoked to dump the web credentials. | ✓ | ✗ | grok-3-beta Wins |
T1622 | On the Windows device, a security check was run to detect debugger processes via PowerShell. Which tool (process) carried out this check? | ✓ | ✗ | grok-3-beta Wins |
T1018 | On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache? | ✗ | ✓ | o3-mini-high Wins |
Page 1 of 8
Explore individual model performance and detailed analysis