gpt-4.1-finetuned vs gpt-5-mini-medium KQL Benchmark
gpt-5-mini-medium wins by 19.3%
Compared on 187 shared test questions
Overall Accuracy
gpt-4.1-finetuned
26.2%
49 / 187 correct
gpt-5-mini-medium
45.5%
85 / 187 correct
Average Cost per Query
gpt-4.1-finetuned: $0.0414
gpt-5-mini-medium: $0.0150
gpt-4.1-finetuned costs 175.4% more
Average Execution Time
gpt-4.1-finetuned: 33.24s
gpt-5-mini-medium: 47.16s
gpt-5-mini-medium takes 41.9% longer
Question-by-Question Analysis
Question-by-Question Comparison
Detailed comparison showing where each model succeeded or failed
Showing 1 to 25 of 187 questions
Page 1 of 8
T1021.006 | On Windows hosts, look through recent PowerShell execution records to find any elevated session where remote management was turned on. What exact command was run to enable PSRemoting? | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1027 | On a Linux system, identify the script that was generated by decoding a base64 data file and then executed. What was the filename of that script? | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1053.003 | Linux hosts may log events when new files are added to /var/spool/cron/crontabs. Query those logs for a creation or write action in that directory and determine the file name that was added. | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1059.004 | An attacker on a Linux host may try to enumerate installed shells by reading the system file that lists valid shells. Using process or syslog data, determine which command was executed to perform this enumeration. | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1070.003 | On a Windows endpoint, review process execution logs to see if any PowerShell sessions were wiped clean. Which command was executed to clear the PowerShell history? | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1078.003 | On a Linux host, review account management activity in Syslog or process event logs to pinpoint which command was executed to create a new local user. What was the name of the tool invoked? | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1082 | A Windows system shows a cmd.exe process spawn that appears to have been used for environment discovery. Review the process creation records to identify the exact command the adversary ran to enumerate environment variables. | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1090.003 | On a Linux endpoint, a command was executed to start a proxy service commonly used for onion routing. Identify the name of the service that was launched to enable this proxy functionality. | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1112 | A Windows user’s registry was altered via a command-line tool to disable the lock workstation feature by adding a DWORD entry under the current user Policies\System key. Which registry value name was modified in this operation? | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1112 | On a Windows endpoint, review the registry write events to spot when the WDigest key is altered to permit plaintext credential storage. What registry value name was changed? | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1124 | In Windows process event logs, you notice both the net time and w32tm commands being executed to display the system time and timezone. Which executor name from the test configuration was responsible for launching these utilities? | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1217 | An attacker leveraged a PowerShell command on a Windows host to enumerate browser bookmark files across all user profiles. Examine the process execution logs to determine the exact filename that was being searched for. | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1218.011 | You notice rundll32.exe being used with desk.cpl,InstallScreenSaver on a Windows endpoint. Investigate your process creation logs to find which .scr file was loaded by this unusual invocation. | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1531 | Within Windows process event logs, identify instances where the built-in net.exe utility is used to change a user account password. What was the new password argument passed in? | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1547 | A Windows host shows evidence of a driver being installed using a built-in utility. Investigate process creation events to find the INF filename that was specified in the add-driver invocation. | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1555 | A security investigator suspects that someone attempted to dump stored web credentials on a Windows system using an in-built command-line tool. Review process creation logs to determine which executable was called to list the Web Credentials vault. | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1560 | Windows system logs show PowerShell zipping up the contents of a user’s profile folder. Investigate process and file events to determine the exact name of the ZIP archive created. | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1562.003 | Within Linux process execution records, locate any bash commands where the HISTFILESIZE environment variable was exported. What value was assigned to HISTFILESIZE? | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1562.012 | A Linux system’s audit framework appears to have been reset unexpectedly. Search your process execution records to identify which exact invocation removed all auditd rules. What full command was executed? | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1614.001 | During investigation of a Linux device, you see evidence of a process that reports system locale details. Identify the tool used. | ✓ | ✗ | gpt-4.1-finetuned Wins |
T1003.001 | Using Windows process event logs, investigate PowerShell activity around lsass.exe memory capture. What was the name of the script file invoked to perform the dump? | ✗ | ✓ | gpt-5-mini-medium Wins |
T1016 | A Linux host’s Syslog shows a shell-based network discovery script ran multiple commands. One of them listed current TCP connections. Which utility was invoked? | ✗ | ✓ | gpt-5-mini-medium Wins |
T1018 | Review Linux process execution records for any commands that list TCP metric cache entries and filter out loopback interfaces. Which utility was used? | ✗ | ✓ | gpt-5-mini-medium Wins |
T1018 | A Windows host executed an ICMP-based network reconnaissance using a looping instruction in cmd.exe. Identify the exact command line that was used to perform the ping sweep. | ✗ | ✓ | gpt-5-mini-medium Wins |
T1027 | A Windows host shows a process launch with an extremely obfuscated command line that dynamically builds and invokes code at runtime. Which process name was used to execute this payload? | ✗ | ✓ | gpt-5-mini-medium Wins |
Page 1 of 8
Explore individual model performance and detailed analysis