gpt-4.1-mini vs grok-3-beta KQL Benchmark
grok-3-beta wins by 7.4%
Compared on 188 shared test questions
Overall Accuracy
gpt-4.1-mini
41.5%
78 / 188 correct
grok-3-beta
48.9%
92 / 188 correct
Average Cost per Query
gpt-4.1-mini: $0.0057
grok-3-beta: $0.0642
grok-3-beta costs 1022.4% more
Average Execution Time
gpt-4.1-mini: 14.13s
grok-3-beta: 16.92s
grok-3-beta takes 19.7% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 188 questions
Page 1 of 8
T1018 | On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache? | ✓ | ✗ | gpt-4.1-mini Wins |
T1016.001 | An analyst notices a PowerShell process on a Windows host that appears to be checking SMB connectivity. Which PowerShell cmdlet was executed to perform this outbound port 445 test? | ✓ | ✗ | gpt-4.1-mini Wins |
T1018 | A Windows host executed an ICMP-based network reconnaissance using a looping instruction in cmd.exe. Identify the exact command line that was used to perform the ping sweep. | ✓ | ✗ | gpt-4.1-mini Wins |
T1048.003 | A Linux host briefly hosted an HTTP service under /tmp. Examine process creation logs to determine the exact python3 command that was used to start the server on port 9090. | ✓ | ✗ | gpt-4.1-mini Wins |
T1048.003 | Windows process creation logs show a PowerShell-driven file transfer to an FTP service. Which executable was leveraged to perform this exfiltration? | ✓ | ✗ | gpt-4.1-mini Wins |
T1059.004 | Which full interactive shell command, as recorded in the Linux process logs, repeatedly echoed a distinctive marker message to the terminal? | ✓ | ✗ | gpt-4.1-mini Wins |
T1059.004 | An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell. | ✓ | ✗ | gpt-4.1-mini Wins |
T1069.001 | Investigate Windows process execution logs for a PowerShell cmdlet used to list group members. Look for entries where a group name is provided after a '-Name' flag and identify which group was queried. | ✓ | ✗ | gpt-4.1-mini Wins |
T1070.003 | On a Windows device, there’s evidence that PowerShell history was wiped by deleting the history file. What was the exact command used to perform this action? | ✓ | ✗ | gpt-4.1-mini Wins |
T1078.003 | Review the Linux process creation records to find which user account management utility was used to reactivate the previously locked and expired account. | ✓ | ✗ | gpt-4.1-mini Wins |
T1082 | Using Linux process execution logs, identify the specific command that was used to filter loaded kernel modules for entries containing “vmw.” What was that full command? | ✓ | ✗ | gpt-4.1-mini Wins |
T1112 | A Windows host logs a change to the Terminal Server registry key disabling single-session per user. Which command-line utility executed this registry modification? | ✓ | ✗ | gpt-4.1-mini Wins |
T1197 | A suspicious BITS transfer was orchestrated via bitsadmin.exe on Windows, creating a job to download and then execute a payload. Investigate the process event logs to determine what custom job name was specified when the BITS job was created. | ✓ | ✗ | gpt-4.1-mini Wins |
T1217 | On a Windows system, you notice a process that recursively enumerates files named 'Bookmarks' under every user profile directory. Which Windows command-line utility was used to perform that search? | ✓ | ✗ | gpt-4.1-mini Wins |
T1218.011 | You notice rundll32.exe being used with desk.cpl,InstallScreenSaver on a Windows endpoint. Investigate your process creation logs to find which .scr file was loaded by this unusual invocation. | ✓ | ✗ | gpt-4.1-mini Wins |
T1546.003 | On a Windows endpoint, an attacker ran a PowerShell sequence to establish a WMI event subscription using CommandLineEventConsumer. Inspect the process or script execution logs to uncover which executable was set to run by this subscription. | ✓ | ✗ | gpt-4.1-mini Wins |
T1555 | On Windows, review PowerShell process events to spot where a remote .ps1 was fetched and run to pull vault credentials. Determine the name of the script file that was downloaded. | ✓ | ✗ | gpt-4.1-mini Wins |
T1557.001 | On Windows devices, hunt for PowerShell activity where a remote script is fetched and executed to perform LLMNR/NBNS spoofing. Which cmdlet kicked off the listener? | ✓ | ✗ | gpt-4.1-mini Wins |
T1560 | Windows system logs show PowerShell zipping up the contents of a user’s profile folder. Investigate process and file events to determine the exact name of the ZIP archive created. | ✓ | ✗ | gpt-4.1-mini Wins |
T1559 | Investigating a Windows device, you suspect a non-standard executable was launched to set up a named pipe for client-server messaging. Determine the name of the executable that was run. | ✓ | ✗ | gpt-4.1-mini Wins |
T1562.003 | Review Windows registry event logs for the ProcessCreationIncludeCmdLine_Enabled value being set to 0. Which PowerShell cmdlet performed this change? | ✓ | ✗ | gpt-4.1-mini Wins |
T1614.001 | On a Windows device, an attacker ran a PowerShell script to collect system settings including UI language and locale. Identify which cmdlet in the command line was used to obtain the system locale. | ✓ | ✗ | gpt-4.1-mini Wins |
T1614.001 | Using Linux process or syslog logs, identify the executable that was run to output the system's locale information. | ✓ | ✗ | gpt-4.1-mini Wins |
T1006 | Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path. | ✗ | ✓ | grok-3-beta Wins |
T1003.001 | Using Windows process event logs, investigate PowerShell activity around lsass.exe memory capture. What was the name of the script file invoked to perform the dump? | ✗ | ✓ | grok-3-beta Wins |
Page 1 of 8
Explore individual model performance and detailed analysis