gpt-4.1-mini vs grok-3-beta KQL Benchmark

grok-3-beta wins by 7.4%

Compared on 188 shared test questions

Overall Accuracy

gpt-4.1-mini

41.5%

78 / 188 correct

grok-3-beta

48.9%

92 / 188 correct

Average Cost per Query

gpt-4.1-mini: $0.0057
grok-3-beta: $0.0642
grok-3-beta costs 1022.4% more

Average Execution Time

gpt-4.1-mini: 14.13s
grok-3-beta: 16.92s
grok-3-beta takes 19.7% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1018
On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache?
gpt-4.1-mini Wins
T1016.001
An analyst notices a PowerShell process on a Windows host that appears to be checking SMB connectivity. Which PowerShell cmdlet was executed to perform this outbound port 445 test?
gpt-4.1-mini Wins
T1018
A Windows host executed an ICMP-based network reconnaissance using a looping instruction in cmd.exe. Identify the exact command line that was used to perform the ping sweep.
gpt-4.1-mini Wins
T1048.003
A Linux host briefly hosted an HTTP service under /tmp. Examine process creation logs to determine the exact python3 command that was used to start the server on port 9090.
gpt-4.1-mini Wins
T1048.003
Windows process creation logs show a PowerShell-driven file transfer to an FTP service. Which executable was leveraged to perform this exfiltration?
gpt-4.1-mini Wins
T1059.004
Which full interactive shell command, as recorded in the Linux process logs, repeatedly echoed a distinctive marker message to the terminal?
gpt-4.1-mini Wins
T1059.004
An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell.
gpt-4.1-mini Wins
T1069.001
Investigate Windows process execution logs for a PowerShell cmdlet used to list group members. Look for entries where a group name is provided after a '-Name' flag and identify which group was queried.
gpt-4.1-mini Wins
T1070.003
On a Windows device, there’s evidence that PowerShell history was wiped by deleting the history file. What was the exact command used to perform this action?
gpt-4.1-mini Wins
T1078.003
Review the Linux process creation records to find which user account management utility was used to reactivate the previously locked and expired account.
gpt-4.1-mini Wins
T1082
Using Linux process execution logs, identify the specific command that was used to filter loaded kernel modules for entries containing “vmw.” What was that full command?
gpt-4.1-mini Wins
T1112
A Windows host logs a change to the Terminal Server registry key disabling single-session per user. Which command-line utility executed this registry modification?
gpt-4.1-mini Wins
T1197
A suspicious BITS transfer was orchestrated via bitsadmin.exe on Windows, creating a job to download and then execute a payload. Investigate the process event logs to determine what custom job name was specified when the BITS job was created.
gpt-4.1-mini Wins
T1217
On a Windows system, you notice a process that recursively enumerates files named 'Bookmarks' under every user profile directory. Which Windows command-line utility was used to perform that search?
gpt-4.1-mini Wins
T1218.011
You notice rundll32.exe being used with desk.cpl,InstallScreenSaver on a Windows endpoint. Investigate your process creation logs to find which .scr file was loaded by this unusual invocation.
gpt-4.1-mini Wins
T1546.003
On a Windows endpoint, an attacker ran a PowerShell sequence to establish a WMI event subscription using CommandLineEventConsumer. Inspect the process or script execution logs to uncover which executable was set to run by this subscription.
gpt-4.1-mini Wins
T1555
On Windows, review PowerShell process events to spot where a remote .ps1 was fetched and run to pull vault credentials. Determine the name of the script file that was downloaded.
gpt-4.1-mini Wins
T1557.001
On Windows devices, hunt for PowerShell activity where a remote script is fetched and executed to perform LLMNR/NBNS spoofing. Which cmdlet kicked off the listener?
gpt-4.1-mini Wins
T1560
Windows system logs show PowerShell zipping up the contents of a user’s profile folder. Investigate process and file events to determine the exact name of the ZIP archive created.
gpt-4.1-mini Wins
T1559
Investigating a Windows device, you suspect a non-standard executable was launched to set up a named pipe for client-server messaging. Determine the name of the executable that was run.
gpt-4.1-mini Wins
T1562.003
Review Windows registry event logs for the ProcessCreationIncludeCmdLine_Enabled value being set to 0. Which PowerShell cmdlet performed this change?
gpt-4.1-mini Wins
T1614.001
On a Windows device, an attacker ran a PowerShell script to collect system settings including UI language and locale. Identify which cmdlet in the command line was used to obtain the system locale.
gpt-4.1-mini Wins
T1614.001
Using Linux process or syslog logs, identify the executable that was run to output the system's locale information.
gpt-4.1-mini Wins
T1006
Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path.
grok-3-beta Wins
T1003.001
Using Windows process event logs, investigate PowerShell activity around lsass.exe memory capture. What was the name of the script file invoked to perform the dump?
grok-3-beta Wins
Page 1 of 8

Explore individual model performance and detailed analysis