gpt-4.1-mini vs gpt-5-nano-medium KQL Benchmark

gpt-4.1-mini wins by 17.8%

Compared on 185 shared test questions

Overall Accuracy

gpt-4.1-mini

41.6%

77 / 185 correct

gpt-5-nano-medium

23.8%

44 / 185 correct

Average Cost per Query

gpt-4.1-mini: $0.0057
gpt-5-nano-medium: $0.0069
gpt-5-nano-medium costs 21.5% more

Average Execution Time

gpt-4.1-mini: 14.13s
gpt-5-nano-medium: 65.07s
gpt-5-nano-medium takes 360.4% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 185 questions
Page 1 of 8
T1003
On a Windows host, an attacker leveraged COMSVCS.DLL via rundll32.exe to dump the memory of svchost.exe into a file under the Temp directory. Review the file creation logs to determine the exact name of the dump file that was generated.
gpt-4.1-mini Wins
T1016.001
An analyst notices a PowerShell process on a Windows host that appears to be checking SMB connectivity. Which PowerShell cmdlet was executed to perform this outbound port 445 test?
gpt-4.1-mini Wins
T1018
A Windows host executed an ICMP-based network reconnaissance using a looping instruction in cmd.exe. Identify the exact command line that was used to perform the ping sweep.
gpt-4.1-mini Wins
T1027
On a Windows endpoint, look for evidence of a base64-encoded PowerShell payload execution. Which executable launched the encoded command?
gpt-4.1-mini Wins
T1036.004
A threat actor on a Windows system crafted and registered a service named almost identically to the standard time service, but redirecting execution to a custom script. Review the logging data to determine which native command-line tool was used to perform this action. What utility was invoked?
gpt-4.1-mini Wins
T1048.003
A Linux host briefly hosted an HTTP service under /tmp. Examine process creation logs to determine the exact python3 command that was used to start the server on port 9090.
gpt-4.1-mini Wins
T1048.003
Windows process creation logs show a PowerShell-driven file transfer to an FTP service. Which executable was leveraged to perform this exfiltration?
gpt-4.1-mini Wins
T1049
In a Windows log analytics workspace, search for PowerShell processes that were used to enumerate network connections. Determine which PowerShell cmdlet was executed to list active TCP connections.
gpt-4.1-mini Wins
T1053.003
Linux hosts may log events when new files are added to /var/spool/cron/crontabs. Query those logs for a creation or write action in that directory and determine the file name that was added.
gpt-4.1-mini Wins
T1059.004
Which full interactive shell command, as recorded in the Linux process logs, repeatedly echoed a distinctive marker message to the terminal?
gpt-4.1-mini Wins
T1059.004
An attacker on a Linux host may try to enumerate installed shells by reading the system file that lists valid shells. Using process or syslog data, determine which command was executed to perform this enumeration.
gpt-4.1-mini Wins
T1059.004
An analyst suspects that a restricted shell escape was executed using a common Perl package manager on Linux. Review the process execution records to determine which tool was invoked to spawn the shell.
gpt-4.1-mini Wins
T1059.004
On a Linux host, identify the process invocation that altered a user’s login shell. What was the full command used?
gpt-4.1-mini Wins
T1069.001
Investigate Windows process execution logs for a PowerShell cmdlet used to list group members. Look for entries where a group name is provided after a '-Name' flag and identify which group was queried.
gpt-4.1-mini Wins
T1070.003
On a Windows endpoint, commands are no longer being logged to PowerShell history, suggesting PSReadLine settings were altered. Using process execution logs, determine the exact command that was run to set the history save style to 'SaveNothing'.
gpt-4.1-mini Wins
T1070.003
On a Windows endpoint, review process execution logs to see if any PowerShell sessions were wiped clean. Which command was executed to clear the PowerShell history?
gpt-4.1-mini Wins
T1070.003
On a Windows device, there’s evidence that PowerShell history was wiped by deleting the history file. What was the exact command used to perform this action?
gpt-4.1-mini Wins
T1070
A suspicious actor appears to have removed the USN change journal on a Windows workstation. Investigate process start records to find out exactly which command was used to delete the journal. What was the full command line invoked?
gpt-4.1-mini Wins
T1082
Using Linux process execution logs, identify the specific command that was used to filter loaded kernel modules for entries containing “vmw.” What was that full command?
gpt-4.1-mini Wins
T1082
A user‐space process on a Linux device invoked a shell to capture and display the system’s environment variables and path. Which exact command was used to perform this discovery?
gpt-4.1-mini Wins
T1112
On a Windows host, registry events reveal that PowerShell modified a value under the WDigest provider. Identify the exact command line that performed this registry change.
gpt-4.1-mini Wins
T1112
On a Windows endpoint, review the registry write events to spot when the WDigest key is altered to permit plaintext credential storage. What registry value name was changed?
gpt-4.1-mini Wins
T1120
Review Windows process execution logs to find any native utility that was used to enumerate connected drives. Which utility was invoked?
gpt-4.1-mini Wins
T1124
Review Windows process execution logs for any PowerShell activity that retrieves the system clock. Which command was executed?
gpt-4.1-mini Wins
T1197
A suspicious BITS transfer was orchestrated via bitsadmin.exe on Windows, creating a job to download and then execute a payload. Investigate the process event logs to determine what custom job name was specified when the BITS job was created.
gpt-4.1-mini Wins
Page 1 of 8

Explore individual model performance and detailed analysis