gpt-4.1-finetuned vs o3-mini-low KQL Benchmark

o3-mini-low wins by 25.5%

Compared on 188 shared test questions

Overall Accuracy

gpt-4.1-finetuned

26.1%

49 / 188 correct

o3-mini-low

51.6%

97 / 188 correct

Average Cost per Query

gpt-4.1-finetuned: $0.0414
o3-mini-low: $0.0279
gpt-4.1-finetuned costs 48.5% more

Average Execution Time

gpt-4.1-finetuned: 33.24s
o3-mini-low: 36.54s
o3-mini-low takes 9.9% longer
Question-by-Question Analysis

Question-by-Question Comparison

Detailed comparison showing where each model succeeded or failed

Showing 1 to 25 of 188 questions
Page 1 of 8
T1003
On a Windows host, an attacker leveraged COMSVCS.DLL via rundll32.exe to dump the memory of svchost.exe into a file under the Temp directory. Review the file creation logs to determine the exact name of the dump file that was generated.
gpt-4.1-finetuned Wins
T1021.006
On Windows hosts, look through recent PowerShell execution records to find any elevated session where remote management was turned on. What exact command was run to enable PSRemoting?
gpt-4.1-finetuned Wins
T1057
On a Windows host, investigate process events to find when Task Manager was launched via cmd with an unusual flag. What was the full command executed?
gpt-4.1-finetuned Wins
T1070.003
On a Linux endpoint, you suspect malicious clearing of the bash history by redirecting from the null device. Explore process or file events to uncover the exact shell command that performed this action.
gpt-4.1-finetuned Wins
T1124
In Windows process event logs, you notice both the net time and w32tm commands being executed to display the system time and timezone. Which executor name from the test configuration was responsible for launching these utilities?
gpt-4.1-finetuned Wins
T1217
An attacker is suspected of using the Windows shell to enumerate a user’s Internet Explorer bookmarks via the Favorites folder. Identify the exact command they executed to perform this listing.
gpt-4.1-finetuned Wins
T1217
An attacker leveraged a PowerShell command on a Windows host to enumerate browser bookmark files across all user profiles. Examine the process execution logs to determine the exact filename that was being searched for.
gpt-4.1-finetuned Wins
T1218.011
You notice rundll32.exe being used with desk.cpl,InstallScreenSaver on a Windows endpoint. Investigate your process creation logs to find which .scr file was loaded by this unusual invocation.
gpt-4.1-finetuned Wins
T1555
A security investigator suspects that someone attempted to dump stored web credentials on a Windows system using an in-built command-line tool. Review process creation logs to determine which executable was called to list the Web Credentials vault.
gpt-4.1-finetuned Wins
T1560
Windows system logs show PowerShell zipping up the contents of a user’s profile folder. Investigate process and file events to determine the exact name of the ZIP archive created.
gpt-4.1-finetuned Wins
T1614.001
During investigation of a Linux device, you see evidence of a process that reports system locale details. Identify the tool used.
gpt-4.1-finetuned Wins
T1614.001
On a Windows device, an attacker ran a PowerShell script to collect system settings including UI language and locale. Identify which cmdlet in the command line was used to obtain the system locale.
gpt-4.1-finetuned Wins
T1614.001
In a Windows environment, locate any occurrences where an elevated DISM utility was run to enumerate the system’s international (locale) settings. What was the exact command line used?
gpt-4.1-finetuned Wins
T1614.001
Using Linux process or syslog logs, identify the executable that was run to output the system's locale information.
gpt-4.1-finetuned Wins
T1003.008
In a Linux environment, an elevated process was used to execute a command that read /etc/shadow and redirected its output to a file. Identify what file name was employed to store these results.
o3-mini-low Wins
T1016
A Linux host’s Syslog shows a shell-based network discovery script ran multiple commands. One of them listed current TCP connections. Which utility was invoked?
o3-mini-low Wins
T1018
On a Windows endpoint, review process creation logs to uncover when a built-in utility was used to reveal ARP entries. What exact command was used to list the ARP cache?
o3-mini-low Wins
T1018
A Windows host executed an ICMP-based network reconnaissance using a looping instruction in cmd.exe. Identify the exact command line that was used to perform the ping sweep.
o3-mini-low Wins
T1027
A Windows host shows a process launch with an extremely obfuscated command line that dynamically builds and invokes code at runtime. Which process name was used to execute this payload?
o3-mini-low Wins
T1036.004
Analyze Windows process events for any schtasks.exe commands that created a new task invoking PowerShell. What is the name of the .ps1 script specified to run?
o3-mini-low Wins
T1036.003
In a Linux environment, you observe a process labeled like the cron daemon but running from an unexpected path. Investigate creation events to uncover the actual filename used by this fake cron process.
o3-mini-low Wins
T1048.003
A Linux host briefly hosted an HTTP service under /tmp. Examine process creation logs to determine the exact python3 command that was used to start the server on port 9090.
o3-mini-low Wins
T1048.003
Windows process creation logs show a PowerShell-driven file transfer to an FTP service. Which executable was leveraged to perform this exfiltration?
o3-mini-low Wins
T1049
A user launched a Windows command prompt and executed a built-in utility to enumerate all active network connections. Using process creation logs, identify the exact tool that produced the list of current connections.
o3-mini-low Wins
T1053.005
You suspect malicious persistence via scheduled tasks on a Windows endpoint. Review the process execution logs to identify the built-in utility used to register tasks at logon or startup. What is the name of this utility?
o3-mini-low Wins
Page 1 of 8

Explore individual model performance and detailed analysis