gpt-4.1 vs grok-3-mini-beta KQL Benchmark

gpt-4.1 wins by 3.2%

Compared on 188 shared test questions

Overall Accuracy

gpt-4.1

61.7%

116 / 188 correct

grok-3-mini-beta

58.5%

110 / 188 correct

Average Cost per Query

gpt-4.1: $0.0285
grok-3-mini-beta: $0.0040
gpt-4.1 costs 618.7% more

Average Execution Time

gpt-4.1: 9.95s
grok-3-mini-beta: 22.38s
grok-3-mini-beta takes 125.0% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1016.001
On a Linux host, a ping command was executed to test internet connectivity. Determine which IP address was used as the ping target.
gpt-4.1 Wins
T1003.001
Using Windows process event logs, investigate PowerShell activity around lsass.exe memory capture. What was the name of the script file invoked to perform the dump?
gpt-4.1 Wins
T1016
A Linux host’s Syslog shows a shell-based network discovery script ran multiple commands. One of them listed current TCP connections. Which utility was invoked?
gpt-4.1 Wins
T1018
Review Linux process execution records for any commands that list TCP metric cache entries and filter out loopback interfaces. Which utility was used?
gpt-4.1 Wins
T1027
On a Linux system, identify the script that was generated by decoding a base64 data file and then executed. What was the filename of that script?
gpt-4.1 Wins
T1036.004
A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked?
gpt-4.1 Wins
T1039
On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action?
gpt-4.1 Wins
T1053.005
Investigate Windows process events for PowerShell activity that leverages WMI to register a scheduled task via XML import. What was the name of the XML file supplied to the RegisterByXml method?
gpt-4.1 Wins
T1057
While reviewing Windows process events, you spot a PowerShell process executing a WMI enumeration cmdlet. What WMI class name did the attacker query?
gpt-4.1 Wins
T1053.005
On Windows, review recent registry changes to detect when the MSC file association was hijacked by a reg add operation. What executable file was configured as the default command under HKCU\Software\Classes\mscfile\shell\open\command?
gpt-4.1 Wins
T1070.004
While reviewing Windows process events, you observe a command that recursively deleted a folder under the temporary directory. Use the process event data to identify which process or tool executed this recursive delete.
gpt-4.1 Wins
T1070.006
On a Windows host, suspicious PowerShell activity adjusted the system clock and recorded a value. What numeric value was used to slip the system date?
gpt-4.1 Wins
T1070.006
On a Linux system, attackers may use timestamp manipulation to hide malicious changes. Investigate relevant logs to identify which file’s modification timestamp was altered by such a command.
gpt-4.1 Wins
T1112
A Windows user’s registry was altered via a command-line tool to disable the lock workstation feature by adding a DWORD entry under the current user Policies\System key. Which registry value name was modified in this operation?
gpt-4.1 Wins
T1124
In Windows process event logs, you notice both the net time and w32tm commands being executed to display the system time and timezone. Which executor name from the test configuration was responsible for launching these utilities?
gpt-4.1 Wins
T1120
Review Windows process execution logs to find any native utility that was used to enumerate connected drives. Which utility was invoked?
gpt-4.1 Wins
T1134.001
A Windows host logs show PowerShell fetching and executing a remote script to gain SeDebugPrivilege token duplication. Which Empire module was invoked?
gpt-4.1 Wins
T1124
On a Linux host, an activity was recorded where the local clock and timezone were queried. Review the available process execution logs to uncover what full command was run to fetch the system time and timezone.
gpt-4.1 Wins
T1217
On a Windows system, you notice a process that recursively enumerates files named 'Bookmarks' under every user profile directory. Which Windows command-line utility was used to perform that search?
gpt-4.1 Wins
T1542.001
Investigate Windows file creation logs to uncover any new executable added directly to the System32 directory, which may indicate a UEFI persistence implant. What was the name of the file created?
gpt-4.1 Wins
T1547.002
A Windows host shows a suspicious registry change under the LSA hive. Review recent registry events to locate any new entries under Authentication Packages and determine the name of the DLL the attacker added.
gpt-4.1 Wins
T1548.001
Investigate Linux process or syslog records to find any invocation of the 'find' utility used to scan /usr/bin for files with the setuid bit. What was the full command executed?
gpt-4.1 Wins
T1548.002
On a Windows host, sift through registry modification events targeting HKLM\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Policies\\System\\ConsentPromptBehaviorAdmin. What new value was written to disable the admin consent prompt?
gpt-4.1 Wins
T1547
A Windows host shows a process launching with install-driver switches, likely signaling malicious driver deployment. What is the name of the tool that was executed?
gpt-4.1 Wins
T1548.001
A Linux host’s Syslog contains records of an elevated shell executing a command that granted group execute rights and enabled the SetGID bit on a file. Investigate the logs and report the name of the file whose group ID bit was modified.
gpt-4.1 Wins
Page 1 of 8

Explore individual model performance and detailed analysis