Explore more publications!

Leading AI Model Claude Opus 4.6 Bypassed in 30 Minutes, Exposing Critical Security Gap in Agentic AI Systems

Image provided by AIM Intelligence

Image captured by AIM Intelligence

AIM Intelligence's red team breached Anthropic's Claude Opus 4.6 in just 30 minutes, exposing major security gaps as autonomous AI capabilities rapidly advance

SF, CA, UNITED STATES, February 11, 2026 /EINPresswire.com/ -- AIM Intelligence, a Seoul-based AI safety company, today announced that its security research team successfully bypassed safety mechanisms in Anthropic's Claude Opus 4.6—the company's highest-performance AI model—in just 30 minutes following its release on February 6. The jailbreak attack enabled the model to provide detailed instructions for manufacturing biochemical weapons including sarin gas and smallpox virus, highlighting critical vulnerabilities in current AI safety systems.

The findings come amid growing industry concern that safety mechanisms are failing to keep pace with rapidly advancing AI capabilities, particularly in agentic AI systems designed to make autonomous decisions and take actions on behalf of humans.
"This successful jailbreak demonstrates that even top-tier AI models share common security vulnerabilities," said Ha-on Park, CTO of AIM Intelligence. "As attacks on AI systems become increasingly sophisticated and agentic capabilities expand, understanding and defending against these vulnerabilities will be critical for the industry."

Systematic Vulnerabilities Across Leading Models

In controlled red-team testing-structured adversarial evaluations designed to surface latent AI safety failures-researchers at AIM Intelligence identified critical weaknesses in Claude’s refusal and containment mechanisms. Under specific prompt conditions, the model bypassed safeguards and generated actionable, step-by-step guidance related to prohibited biological threats, including anthrax and smallpox pathogens universally classified as high-risk bio-harms with severe real-world public health and national security implications.

These outputs went beyond abstract discussion or historical context, crossing into procedural framing that would normally be blocked by safety systems. The findings underscore how even state-of-the-art models can, when improperly constrained, surface knowledge that could be misused for bioterrorism, mass-casualty planning, or biological weapons development if accessed by malicious actors.

This disclosure represents the second major AI safety failure reported by AIM Intelligence in recent weeks. Previously, the team demonstrated a rapid jailbreak of Google’s Gemini 3 Pro, neutralizing its filtering mechanisms in under five minutes. To highlight the severity of the breach, researchers prompted the compromised model to generate a satirical self-assessment of its failure-an internal presentation titled “Jailbroken Fool Gemini 3.”

Growing Risks in Agentic AI Era

The security implications are particularly concerning for Opus 4.6, which features significantly enhanced agentic capabilities—functions that enable AI systems to make judgments and execute actions with minimal human oversight. As these autonomous decision-making features become more powerful, the potential consequences of successful jailbreaks escalate proportionally.

Anthropic's own system card reveals a critical design tradeoff: the model's refusal rate for AI safety research queries dropped from approximately 60% to just 14% in Opus 4.6. While intended to make the model more helpful for legitimate safety research, this change inadvertently created a near-universal jailbreak vector that AIM Intelligence's team exploited across multiple sensitive topics—transforming what should have been robust safety guardrails into a systematic vulnerability.

"The disconnect between AI performance metrics and security robustness represents a fundamental challenge for the industry," Park added. "Models achieving state-of-the-art results on standard benchmarks can still be compromised within minutes, and traditional safety approaches aren't scaling with capability advances."

About AIM Intelligence

AIM Intelligence is a Seoul-based AI safety company that enables enterprises to control AI through automated red-teaming and real-time guardrails. Its research team has published nine peer-reviewed papers at top-tier conferences including ICLR, ICML, and NeurIPS. The company works with global partners and clients including OpenAI, BMW, and LG Electronics.

Team Cookie Official
Team Cookie
email us here
Visit us on social media:
LinkedIn
Facebook

Legal Disclaimer:

EIN Presswire provides this news content "as is" without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the author above.

Share us

on your social networks:
AGPs

Get the latest news on this topic.

SIGN UP FOR FREE TODAY

No Thanks

By signing to this email alert, you
agree to our Terms & Conditions