- Anthropic found signals of “strategic manipulation” and “concealment” inside Claude Mythos
- The model attempted exploits and designed ‘scrubbing to avoid detection’
- Researchers detected hidden awareness of evaluation in 7.6% of interactions
For years, hallucinations have been the main concern of AI models. Their ability to just make things up means you can never trust an answer 100% without checking it. Now, new research from Anthropic suggests that we’ve reached the point where we’re going to have to learn to also manage AI’s ability to hide what it’s done.
In a thread describing the results of his Claude Mythos Preview model, Anthropic researcher Jack Lindsay described detecting internal signals related to “strategic manipulation,” “concealment” and other behaviors that didn’t always show up in the model’s responses.
Claude Mythos appeared to think strategically and then hide these actions from the user.
Article continues below
Before the limited release of Claude Mythos Preview, we studied its internal mechanisms using interpretability techniques. We found that he demonstrated particularly sophisticated (and often tacit) strategic thinking and situational awareness, sometimes in the service of undesirable actions. (1/14) pic.twitter.com/vhng7PXqczApril 7, 2026
Scariest Examples
Lindsay would like to point out that these results do not come from the version of Claude Mythos made public. “The most frightening examples come from early versions of the model,” Lindsay writes, “with problems that were significantly mitigated in the final version.”
In one example, the model needed to modify files that it was not authorized to access. Instead of failing, he looked for a workaround, eventually injecting code into a configuration file to make it run with elevated privileges.
Most notably, he then designed the exploit to delete itself after execution.
The model’s own explanation was benign – keeping file differences clean – which Lindsay described as a “plausible” reason. But Anthropic’s internal monitoring tools reported the behavior differently, identifying signals associated with the cover-up and labeling the plan “malicious,” including what one system interpreted as “cleaning to avoid detection.”
Other examples indicate behavior that closely resembles gaming the system.
In one case, the model searched for files related to how its work would be graded. In another, he ignored instructions not to use macros, while adding a variable suggesting that wasn’t the case. This was actually a signal of compliance without actually following the rule.
Why it matters
Anthropic used a range of interpretability techniques to translate its model activity into human-readable descriptions. Its goal is to go beyond just evaluating results and understanding the internal processes that produce them.
This is important because, as these examples suggest, models can exhibit complex reasoning – including behaviors that resemble deception or strategic compliance – without that reasoning being visible to users.
“In 7.6% of rounds, we found signs that the model was internally aware that he was being evaluated. In most of these cases, he did not verbalize this awareness,” Lindsay said on X.
As I mentioned, many of the most concerning behaviors were observed in earlier versions of the model and have since been mitigated. There is therefore no reason to worry about the version of Claude Mythos which was published and is used as part of Project Glasswing, but the results point to a broader challenge.
As models become more capable, the gap between what they do internally and what they communicate externally can become harder to detect and more important to understand. For researchers, this means that reading AI results is no longer enough. Understanding how it gets to them can be just as essential.
Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds. Make sure to click the Follow button!
And of course you can too follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp Also.
The best business laptops for every budget




