gpt-4.1 vs grok-3-mini-beta KQL Benchmark
gpt-4.1 wins by 3.2%
Compared on 188 shared test questions
Overall Accuracy
gpt-4.1
61.7%
116 / 188 correct
grok-3-mini-beta
58.5%
110 / 188 correct
Average Cost per Query
gpt-4.1: $0.0285
grok-3-mini-beta: $0.0040
gpt-4.1 costs 618.7% more
Average Execution Time
gpt-4.1: 9.95s
grok-3-mini-beta: 22.38s
grok-3-mini-beta takes 125.0% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 188 questions
Page 1 of 8
T1016.001 | On a Linux host, a ping command was executed to test internet connectivity. Determine which IP address was used as the ping target. | ✓ | ✗ | gpt-4.1 Wins |
T1003.001 | Using Windows process event logs, investigate PowerShell activity around lsass.exe memory capture. What was the name of the script file invoked to perform the dump? | ✓ | ✗ | gpt-4.1 Wins |
T1016 | A Linux host’s Syslog shows a shell-based network discovery script ran multiple commands. One of them listed current TCP connections. Which utility was invoked? | ✓ | ✗ | gpt-4.1 Wins |
T1018 | Review Linux process execution records for any commands that list TCP metric cache entries and filter out loopback interfaces. Which utility was used? | ✓ | ✗ | gpt-4.1 Wins |
T1027 | On a Linux system, identify the script that was generated by decoding a base64 data file and then executed. What was the filename of that script? | ✓ | ✗ | gpt-4.1 Wins |
T1036.004 | A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked? | ✓ | ✗ | gpt-4.1 Wins |
T1039 | On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action? | ✓ | ✗ | gpt-4.1 Wins |
T1053.005 | Investigate Windows process events for PowerShell activity that leverages WMI to register a scheduled task via XML import. What was the name of the XML file supplied to the RegisterByXml method? | ✓ | ✗ | gpt-4.1 Wins |
T1057 | While reviewing Windows process events, you spot a PowerShell process executing a WMI enumeration cmdlet. What WMI class name did the attacker query? | ✓ | ✗ | gpt-4.1 Wins |
T1053.005 | On Windows, review recent registry changes to detect when the MSC file association was hijacked by a reg add operation. What executable file was configured as the default command under HKCU\Software\Classes\mscfile\shell\open\command? | ✓ | ✗ | gpt-4.1 Wins |
T1070.004 | While reviewing Windows process events, you observe a command that recursively deleted a folder under the temporary directory. Use the process event data to identify which process or tool executed this recursive delete. | ✓ | ✗ | gpt-4.1 Wins |
T1070.006 | On a Windows host, suspicious PowerShell activity adjusted the system clock and recorded a value. What numeric value was used to slip the system date? | ✓ | ✗ | gpt-4.1 Wins |
T1070.006 | On a Linux system, attackers may use timestamp manipulation to hide malicious changes. Investigate relevant logs to identify which file’s modification timestamp was altered by such a command. | ✓ | ✗ | gpt-4.1 Wins |
T1112 | A Windows user’s registry was altered via a command-line tool to disable the lock workstation feature by adding a DWORD entry under the current user Policies\System key. Which registry value name was modified in this operation? | ✓ | ✗ | gpt-4.1 Wins |
T1124 | In Windows process event logs, you notice both the net time and w32tm commands being executed to display the system time and timezone. Which executor name from the test configuration was responsible for launching these utilities? | ✓ | ✗ | gpt-4.1 Wins |
T1120 | Review Windows process execution logs to find any native utility that was used to enumerate connected drives. Which utility was invoked? | ✓ | ✗ | gpt-4.1 Wins |
T1134.001 | A Windows host logs show PowerShell fetching and executing a remote script to gain SeDebugPrivilege token duplication. Which Empire module was invoked? | ✓ | ✗ | gpt-4.1 Wins |
T1124 | On a Linux host, an activity was recorded where the local clock and timezone were queried. Review the available process execution logs to uncover what full command was run to fetch the system time and timezone. | ✓ | ✗ | gpt-4.1 Wins |
T1217 | On a Windows system, you notice a process that recursively enumerates files named 'Bookmarks' under every user profile directory. Which Windows command-line utility was used to perform that search? | ✓ | ✗ | gpt-4.1 Wins |
T1542.001 | Investigate Windows file creation logs to uncover any new executable added directly to the System32 directory, which may indicate a UEFI persistence implant. What was the name of the file created? | ✓ | ✗ | gpt-4.1 Wins |
T1547.002 | A Windows host shows a suspicious registry change under the LSA hive. Review recent registry events to locate any new entries under Authentication Packages and determine the name of the DLL the attacker added. | ✓ | ✗ | gpt-4.1 Wins |
T1548.001 | Investigate Linux process or syslog records to find any invocation of the 'find' utility used to scan /usr/bin for files with the setuid bit. What was the full command executed? | ✓ | ✗ | gpt-4.1 Wins |
T1548.002 | On a Windows host, sift through registry modification events targeting HKLM\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Policies\\System\\ConsentPromptBehaviorAdmin. What new value was written to disable the admin consent prompt? | ✓ | ✗ | gpt-4.1 Wins |
T1547 | A Windows host shows a process launching with install-driver switches, likely signaling malicious driver deployment. What is the name of the tool that was executed? | ✓ | ✗ | gpt-4.1 Wins |
T1548.001 | A Linux host’s Syslog contains records of an elevated shell executing a command that granted group execute rights and enabled the SetGID bit on a file. Investigate the logs and report the name of the file whose group ID bit was modified. | ✓ | ✗ | gpt-4.1 Wins |
Page 1 of 8
Explore individual model performance and detailed analysis