gpt-4.1 vs gpt-4.1-finetuned KQL Benchmark

gpt-4.1 wins by 35.6%

Compared on 188 shared test questions

Overall Accuracy

gpt-4.1

61.7%

116 / 188 correct

gpt-4.1-finetuned

26.1%

49 / 188 correct

Average Cost per Query

gpt-4.1: $0.0285
gpt-4.1-finetuned: $0.0414
gpt-4.1-finetuned costs 45.2% more

Average Execution Time

gpt-4.1: 9.95s
gpt-4.1-finetuned: 33.24s
gpt-4.1-finetuned takes 234.2% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1007
An analyst suspects a user or script ran a service enumeration command on a Linux system. Review process events to find the service-listing invocation and specify the full command that was executed.
gpt-4.1 Wins
T1006
Identify the PowerShell cmdlet used on Windows to format and display the raw volume bytes after an attacker read the boot sector via a DOS device path.
gpt-4.1 Wins
T1003.001
Using Windows process event logs, investigate PowerShell activity around lsass.exe memory capture. What was the name of the script file invoked to perform the dump?
gpt-4.1 Wins
T1016
A Linux host’s Syslog shows a shell-based network discovery script ran multiple commands. One of them listed current TCP connections. Which utility was invoked?
gpt-4.1 Wins
T1018
Review Linux process execution records for any commands that list TCP metric cache entries and filter out loopback interfaces. Which utility was used?
gpt-4.1 Wins
T1018
On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache?
gpt-4.1 Wins
T1027
A Windows host shows a process launch with an extremely obfuscated command line that dynamically builds and invokes code at runtime. Which process name was used to execute this payload?
gpt-4.1 Wins
T1036.004
A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked?
gpt-4.1 Wins
T1036.003
A process is running under a familiar Windows host name but originates from a user's AppData folder rather than the System32 directory. Identify the filename used to masquerade the PowerShell binary on this Windows device.
gpt-4.1 Wins
T1018
A Windows host executed an ICMP-based network reconnaissance using a looping instruction in cmd.exe. Identify the exact command line that was used to perform the ping sweep.
gpt-4.1 Wins
T1049
A user launched a Windows command prompt and executed a built-in utility to enumerate all active network connections. Using process creation logs, identify the exact tool that produced the list of current connections.
gpt-4.1 Wins
T1039
On a Windows system, someone ran PowerShell to copy a file from a remote machine’s C$ share to the local TEMP folder. Using process event logs, what full PowerShell command was executed to perform this action?
gpt-4.1 Wins
T1049
In a Windows log analytics workspace, search for PowerShell processes that were used to enumerate network connections. Determine which PowerShell cmdlet was executed to list active TCP connections.
gpt-4.1 Wins
T1048.003
Windows process creation logs show a PowerShell-driven file transfer to an FTP service. Which executable was leveraged to perform this exfiltration?
gpt-4.1 Wins
T1053.005
You suspect malicious persistence via scheduled tasks on a Windows endpoint. Review the process execution logs to identify the built-in utility used to register tasks at logon or startup. What is the name of this utility?
gpt-4.1 Wins
T1053.005
Investigate Windows process events for PowerShell activity that leverages WMI to register a scheduled task via XML import. What was the name of the XML file supplied to the RegisterByXml method?
gpt-4.1 Wins
T1057
On a Windows device, PowerShell was used to collect a snapshot of running processes. Identify the exact cmdlet that was executed.
gpt-4.1 Wins
T1053.006
Examine the logs from the Linux system for events related to the systemd timer activation. Identify any records indicating that a new timer unit was started and enabled, and determine which timer name was used.
gpt-4.1 Wins
T1057
While reviewing Windows process events, you spot a PowerShell process executing a WMI enumeration cmdlet. What WMI class name did the attacker query?
gpt-4.1 Wins
T1053.005
On Windows, review recent registry changes to detect when the MSC file association was hijacked by a reg add operation. What executable file was configured as the default command under HKCU\Software\Classes\mscfile\shell\open\command?
gpt-4.1 Wins
T1057
A Windows endpoint recorded a command-line activity through cmd.exe that lists all running processes. Determine which built-in tool was executed to perform this action.
gpt-4.1 Wins
T1069.001
On a Linux endpoint, process events reveal a chain of group‐enumeration utilities executed by a single session. Which utility was used to query the system’s group database?
gpt-4.1 Wins
T1070.003
On a Linux system, you suspect someone erased their command history by linking the history file to /dev/null. Investigate process events and determine which utility was executed to achieve this.
gpt-4.1 Wins
T1059.007
On a Windows endpoint, wscript.exe was used to run a JScript. Identify the exact script path passed to wscript.
gpt-4.1 Wins
T1069.001
Investigate Windows process execution logs for a PowerShell cmdlet used to list group members. Look for entries where a group name is provided after a '-Name' flag and identify which group was queried.
gpt-4.1 Wins
Page 1 of 8

Explore individual model performance and detailed analysis