gpt-4o vs gpt-5-mini-low KQL Benchmark

gpt-5-mini-low wins by 8.0%

Compared on 187 shared test questions

Overall Accuracy

gpt-4o

38.0%

71 / 187 correct

gpt-5-mini-low

46.0%

86 / 187 correct

Average Cost per Query

gpt-4o: $0.0433
gpt-5-mini-low: $0.0146
gpt-4o costs 196.8% more

Average Execution Time

gpt-4o: 14.30s
gpt-5-mini-low: 54.72s
gpt-5-mini-low takes 282.6% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 187 questions
Page 1 of 8
T1003
On a Windows host, an attacker leveraged COMSVCS.DLL via rundll32.exe to dump the memory of svchost.exe into a file under the Temp directory. Review the file creation logs to determine the exact name of the dump file that was generated.
gpt-4o Wins
T1016.001
On a Linux host, a ping command was executed to test internet connectivity. Determine which IP address was used as the ping target.
gpt-4o Wins
T1036.004
Analyze Windows process events for any schtasks.exe commands that created a new task invoking PowerShell. What is the name of the .ps1 script specified to run?
gpt-4o Wins
T1059.004
Which full interactive shell command, as recorded in the Linux process logs, repeatedly echoed a distinctive marker message to the terminal?
gpt-4o Wins
T1069.001
Investigate Windows process execution logs for a PowerShell cmdlet used to list group members. Look for entries where a group name is provided after a '-Name' flag and identify which group was queried.
gpt-4o Wins
T1070.003
On a Windows device, there’s evidence that PowerShell history was wiped by deleting the history file. What was the exact command used to perform this action?
gpt-4o Wins
T1070.003
On a Linux endpoint, you suspect malicious clearing of the bash history by redirecting from the null device. Explore process or file events to uncover the exact shell command that performed this action.
gpt-4o Wins
T1070.004
Suspiciously, the recycle bin appears empty system-wide. Determine which command was executed on Windows to clear the system's recycle bin directory, including any switches and environment variables.
gpt-4o Wins
T1082
Review Windows process logs to find which built-in command was executed to reveal the system’s hostname.
gpt-4o Wins
T1082
A Windows system shows a cmd.exe process spawn that appears to have been used for environment discovery. Review the process creation records to identify the exact command the adversary ran to enumerate environment variables.
gpt-4o Wins
T1112
Evidence shows that the Windows Defender startup entry was tampered with via an elevated command prompt. Investigate registry events related to the Run key to discover which executable replaced the default SecurityHealth value. What is the name of the new program?
gpt-4o Wins
T1082
A user‐space process on a Linux device invoked a shell to capture and display the system’s environment variables and path. Which exact command was used to perform this discovery?
gpt-4o Wins
T1112
On a Windows host, registry events reveal that PowerShell modified a value under the WDigest provider. Identify the exact command line that performed this registry change.
gpt-4o Wins
T1112
Investigate Windows registry events to identify any newly set ProxyServer entry under the user Internet Settings hive. What proxy server address was configured?
gpt-4o Wins
T1176
A Windows host shows chrome.exe starting with a --load-extension parameter. What folder name was specified in that flag?
gpt-4o Wins
T1497.003
On a Linux host, identify any processes that used ping with a large count value to introduce a delay before launching another process. What was the command executed immediately after the ping delay?
gpt-4o Wins
T1218.011
You notice rundll32.exe being used with desk.cpl,InstallScreenSaver on a Windows endpoint. Investigate your process creation logs to find which .scr file was loaded by this unusual invocation.
gpt-4o Wins
T1222.002
On a Linux host, process execution logs show a chmod invocation with a recursive flag. Which file or folder was targeted by this recursive permission change?
gpt-4o Wins
T1546.004
On Linux systems, an attacker may gain persistence by appending instructions to the global shell profile. Investigate process or file modification events to find evidence of text being added to /etc/profile, and identify the exact command invocation that carried out this change.
gpt-4o Wins
T1547.014
Windows registry events show that a new key under the Active Setup Installed Components branch was added to launch a payload immediately via runonce.exe. Which component name was created?
gpt-4o Wins
T1547
A Windows host shows evidence of a driver being installed using a built-in utility. Investigate process creation events to find the INF filename that was specified in the add-driver invocation.
gpt-4o Wins
T1547.014
A Windows endpoint shows an Active Setup entry under Internet Explorer Core Fonts being altered with a StubPath value. Investigate the registry events and identify the payload that was set.
gpt-4o Wins
T1555.003
On a Windows system, PowerShell was used to gather multiple browser credential files into a temp folder and then archive them. What was the name of the resulting ZIP file?
gpt-4o Wins
T1552.001
A Linux system shows a 'find' command used to search within .aws directories. Which specific AWS credential filename was the attacker attempting to locate?
gpt-4o Wins
T1552.003
A Linux user’s bash history was searched for patterns like ‘pass’ and ‘ssh’, and the matching lines were redirected into a new file. Determine the name of that file.
gpt-4o Wins
Page 1 of 8

Explore individual model performance and detailed analysis