Anthropic abandons its security promise and rewrites AI guardrails

Anthropic has withdrawn its commitment not to train or release AI models without first guaranteeing security measures.
The company will now rely on transparency reporting and security roadmaps rather than strict prerequisites.
Critics say the change shows the limits of voluntary commitments to AI safety without binding regulation.

Anthropic has formally abandoned a central promise not to train or launch border AI systems unless it can guarantee adequate security in advance. The company behind Claude confirmed the decision in an interview with Timemarking the end of a policy that once distinguished him among AI developers. The recently revised Responsible Scaling Policy is more about ensuring the company remains competitive as the AI market heats up.

For years, Anthropic presented this commitment as proof that it would resist commercial pressures pushing its competitors to offer ever more powerful systems. The policy effectively prevented it from advancing beyond certain levels unless predefined security measures were already in place. Now Anthropic uses a more flexible framework rather than categorical breaks.

The company insists the change is pragmatic rather than ideological. Executives argue that unilateral restrictions no longer make sense in a market characterized by rapid iterations and geopolitical urgency. But this shift appears to be a turning point in how the AI industry views self-regulation.

As part of the new Responsible Scaling Policy, Anthropic commits to publishing detailed “border security roadmaps” outlining planned security steps, as well as regular “risk reports” that assess model capabilities and potential threats. The company also says it will match or exceed competitors’ safety efforts and delay development if it believes it is ahead of the pack and identifies a significant catastrophic risk. What he will no longer do is promise to stop training until all mitigation measures are guaranteed in advance.

Daily users may not notice any changes when interacting with Claude or other AI tools. Yet the guardrails that govern how these systems are trained influence everything from accuracy to fraudulent use. When the company, once defined by its strict prerequisites, decides that those conditions are no longer achievable, it signals a broader recalibration within the industry.

Check Claude

When Anthropic outlined its initial policy in 2023, some executives hoped it could inspire competitors or even inform possible regulation. This regulatory momentum never fully materialized. Federal legislation on AI remains stalled and the broader political climate is moving away from developing any kind of framework. Businesses must choose between self-restraint and competitive survival.

Anthropic is growing rapidly, with its revenue and portfolio outpacing competitors like OpenAI and Google, even mocking ChatGPT by receiving ads in a Super Bowl commercial. But the company clearly saw the security red line as an obstacle to that growth.

Anthropic maintains that its revised framework preserves meaningful safeguards. The new roadmaps aim to create internal pressure to prioritize mitigation research. Upcoming risk reports aim to provide a clearer public accounting of how model capabilities could lead to misuse.

“The new policy still includes some safeguards, but the fundamental promise that Anthropic would not release models unless it could ensure adequate security measures in advance is gone,” said Nik Kairinos, CEO and co-founder of RAIDS AI, an organization focused on independent oversight and risk detection in AI. “This is precisely why continuous, independent monitoring of AI systems is important. Voluntary commitments can be rewritten. Regulation, supported by real-time monitoring, cannot.”

Kairinos also pointed out the irony of Anthropic’s $20 million payment a few weeks ago to Public First Action, a group supporting congressional candidates who pledge to push for AI safety regulations. This contribution, he suggested, highlights the complexity of the current moment. Businesses can advocate for stricter regulation while simultaneously recalibrating their own internal constraints.

(Image credit: Getty Images/Smith Collection/Gado)

The broader question facing the industry is whether voluntary standards can meaningfully shape the trajectory of transformative technologies. Anthropic once tried to anchor itself as a model of restraint. Its revised policy requires it to compensate for competition. This doesn’t mean security has been abandoned, but it does mean the order of operations has changed.

The average person may not read responsible scaling policies or risk reports, but they live with the downstream effects of these decisions. Anthropic argues that meaningful security research requires staying at the border, not moving away from it. Whether this philosophy proves reassuring or troubling depends largely on how AI should evolve and how much risk society is willing to tolerate in exchange for progress.

Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds. Make sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp Also.

Must Read

Leave a Comment Cancel Reply