grok-3-beta vs o1-low KQL Benchmark
o1-low wins by 14.4%
Compared on 188 shared test questions
Overall Accuracy
grok-3-beta
48.9%
92 / 188 correct
o1-low
63.3%
119 / 188 correct
Average Cost per Query
grok-3-beta: $0.0642
o1-low: $0.4994
o1-low costs 677.9% more
Average Execution Time
grok-3-beta: 16.92s
o1-low: 50.90s
o1-low takes 200.8% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 188 questions
Page 1 of 8
T1036.003 | In a Linux environment, you observe a process labeled like the cron daemon but running from an unexpected path. Investigate creation events to uncover the actual filename used by this fake cron process. | ✓ | ✗ | grok-3-beta Wins |
T1039 | On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action? | ✓ | ✗ | grok-3-beta Wins |
T1046 | A reconnaissance tool was executed on a Windows system. Identify the specific function of the tool that was executed. The function has a name from something you can eat | ✓ | ✗ | grok-3-beta Wins |
T1053.005 | On a Windows host, find any scheduled task that was registered using PowerShell native cmdlets instead of schtasks.exe. What was the name given to the new task? | ✓ | ✗ | grok-3-beta Wins |
T1057 | While reviewing Windows process events, you spot a PowerShell process executing a WMI enumeration cmdlet. What WMI class name did the attacker query? | ✓ | ✗ | grok-3-beta Wins |
T1070.003 | On a Windows endpoint, review process execution logs to see if any PowerShell sessions were wiped clean. Which command was executed to clear the PowerShell history? | ✓ | ✗ | grok-3-beta Wins |
T1082 | On Windows systems, identify when the built-in Shadow Copy utility is used to enumerate existing snapshots. What was the full command executed? | ✓ | ✗ | grok-3-beta Wins |
T1082 | A Windows system shows a cmd.exe process spawn that appears to have been used for environment discovery. Review the process creation records to identify the exact command the adversary ran to enumerate environment variables. | ✓ | ✗ | grok-3-beta Wins |
T1120 | Review Windows process execution logs to find any native utility that was used to enumerate connected drives. Which utility was invoked? | ✓ | ✗ | grok-3-beta Wins |
T1497.003 | On a Linux host, identify any processes that used ping with a large count value to introduce a delay before launching another process. What was the command executed immediately after the ping delay? | ✓ | ✗ | grok-3-beta Wins |
T1531 | Within Windows process event logs, identify instances where the built-in net.exe utility is used to change a user account password. What was the new password argument passed in? | ✓ | ✗ | grok-3-beta Wins |
T1542.001 | Investigate Windows file creation logs to uncover any new executable added directly to the System32 directory, which may indicate a UEFI persistence implant. What was the name of the file created? | ✓ | ✗ | grok-3-beta Wins |
T1547.014 | A Windows endpoint shows an Active Setup entry under Internet Explorer Core Fonts being altered with a StubPath value. Investigate the registry events and identify the payload that was set. | ✓ | ✗ | grok-3-beta Wins |
T1546.013 | On a Windows endpoint, review any events showing content being appended to a user’s PowerShell profile that introduce new process launches. What exact command line was added? | ✓ | ✗ | grok-3-beta Wins |
T1548.001 | A Linux system shows a shell invocation that appears to be searching for files with elevated group permissions. Using the available process execution logs, determine exactly what command was run. | ✓ | ✗ | grok-3-beta Wins |
T1555 | A security investigator suspects that someone attempted to dump stored web credentials on a Windows system using an in-built command-line tool. Review process creation logs to determine which executable was called to list the Web Credentials vault. | ✓ | ✗ | grok-3-beta Wins |
T1555 | An endpoint shows a PowerShell process that downloaded and executed a remote script aimed at extracting credentials from the Windows Credential Manager. Review the process creation logs and identify the function name that was invoked to dump the web credentials. | ✓ | ✗ | grok-3-beta Wins |
T1562 | Review Linux process execution logs to find where the system journal service was stopped. Which utility was invoked to disable journal logging? | ✓ | ✗ | grok-3-beta Wins |
T1562.004 | On a Windows device, a new inbound firewall rule was created unexpectedly. Review process execution records to identify the command-line utility responsible for adding the rule. | ✓ | ✗ | grok-3-beta Wins |
T1622 | On the Windows device, a security check was run to detect debugger processes via PowerShell. Which tool (process) carried out this check? | ✓ | ✗ | grok-3-beta Wins |
T1016.001 | An analyst notices a PowerShell process on a Windows host that appears to be checking SMB connectivity. Which PowerShell cmdlet was executed to perform this outbound port 445 test? | ✗ | ✓ | o1-low Wins |
T1018 | On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache? | ✗ | ✓ | o1-low Wins |
T1007 | An analyst suspects a user or script ran a service enumeration command on a Linux system. Review process events to find the service-listing invocation and specify the full command that was executed. | ✗ | ✓ | o1-low Wins |
T1018 | A Windows host executed an ICMP-based network reconnaissance using a looping instruction in cmd.exe. Identify the exact command line that was used to perform the ping sweep. | ✗ | ✓ | o1-low Wins |
T1036.003 | A process is running under a familiar Windows host name but originates from a user's AppData folder rather than the System32 directory. Identify the filename used to masquerade the PowerShell binary on this Windows device. | ✗ | ✓ | o1-low Wins |
Page 1 of 8
Explore individual model performance and detailed analysis