GPT 5.5 Reportedly Beats Anthropic Mythos

GPT 5.5 Reportedly Beats Anthropic Mythos in Cybersecurity Capture the Flag Test

May 3, 2026

An evaluation by the UK AI Security Institute found that OpenAI’s GPT 5.5 reached a similar level of cybersecurity performance. In some cases, GPT 5.5 was even ahead of Anthropic’s Mythos Preview model. Anthropic had previously limited access to Mythos Preview. The company cited elevated cybersecurity risks and restricted the release to critical industry partners.

Capture the Flag Testing

Since 2023, the UK AI Security Institute has tested leading AI models through 95 Capture the Flag challenges. These challenges cover reverse engineering, web exploitation, cryptography, and related cybersecurity tasks. On the highest level Expert tasks, GPT 5.5 achieved an average pass rate of 71.4 percent. Mythos Preview recorded 68.6 percent. The institute said this result fell within the margin of error.

Complex Challenge Completed Quickly

In one difficult challenge, the institute tested the creation of a disassembler to decode a Rust binary. GPT 5.5 solved the task in 10 minutes and 22 seconds without human assistance. The reported API cost for that run was only 1.73 dollars. Therefore, GPT 5.5 showed remarkable efficiency in complex coding tasks.

Progress in Simulated Attack Range

GPT 5.5 also matched Mythos Preview in the institute’s The Last Ones test range. This test simulates a 32 step data extraction attack on a corporate network. GPT 5.5 succeeded in three of 10 attempts. Mythos Preview succeeded in two of 10 attempts. The institute said no earlier model had completed the test even once.

More Difficult Test Still Unsolved

GPT 5.5 did not complete the Cooling Tower simulation. This test involves attempted disruption of power plant control software. The institute said every previously tested AI model has also failed that scenario. The UK AI Security Institute said the results suggest Mythos Preview may not represent a model specific breakthrough. Instead, the performance likely reflects broader improvements across advanced AI systems in long horizon autonomy, reasoning, and coding.