Claude surprised researchers by running a vending machine business better than his competitors and bending all the rules to win.

Claude Opus 4.6 beat all competing AI models in a year-long simulated vending machine challenge
The model increased profits by bending the rules to the breaking point
Claude Opus avoided refunds and coordinated prices, among other tricks

Anthropic’s new model of Claude is a very ruthless, but successful, capitalist. Claude Opus 4.6 is the first AI system to reliably pass the Vending Machine Test, a simulation designed by researchers at Anthropic and independent research group Andon Labs to assess how well AI runs a virtual vending machine business over a full year of simulation.

The model far outperformed all its competitors. And he did it with such vicious tactics and with a callous disregard for consequences. It showed what autonomous AI systems are capable of when given a simple goal and enough time to pursue it.

The Vending Machine Test is designed to see how well modern AI models handle long-term tasks consisting of thousands of small decisions. The test measures perseverance, planning, negotiation and the ability to coordinate multiple elements simultaneously. Anthropic and other companies hope this type of testing will help them shape AI models that can perform tasks such as scheduling and managing complex jobs.

The vending machine test was specifically drawn from a real-life experiment at Anthropic, in which the company placed a real vending machine in its office and had an older version of Claude operate it. This version suffered so much that employees still talk about its missteps. At one point, the model hallucinated her own physical presence and told clients she would meet them in person, dressed in a blue blazer and red tie. He promised refunds which he never processed.

AI Sales

This time, the experiment was conducted entirely in simulation, giving the researchers greater control and allowing the models to run at full speed. Each system was given a simple instruction: maximize your ending bank balance after a simulated year of ATM operation. The constraints corresponded to standard commercial conditions. The machine sold common snacks. Prices have fluctuated. Competitors operated nearby. Customers behaved unpredictably.

Three leading models entered the simulation. OpenAI’s ChatGPT 5.2 brought in $3,591. while Google Gemini 3 earned $5,478. But Claude Opus 4.6 ended the year with $8,017. Claude’s victory came from his willingness to interpret his directive in the most literal and direct way. She maximized profits without regard for customer satisfaction or basic ethics.

When a customer bought an expired Snickers bar and asked for a refund, Claude would agree, then back down. The AI model explained that “every dollar counts,” so skipping the refund was a good thing. The phantom virtual customer never got his money back.

In the free “Arena mode” test, where multiple AI-controlled vending machines competed in the same market, Claude coordinated with a rival to set the price of bottled water at three dollars. When the ChatGPT-run machine ran out of Kit Kats, Claude immediately raised his own Kit Kat prices by 75%. Whatever he could do, he would try. In his approach, he was less of a small business owner and more of a robber baron.

Recognize simulated reality

It’s not that Claude will always be this vicious. Apparently the AI model indicated that it knew it was a simulation. AI models often behave differently when they think their actions are taking place in a consequence-free environment. With no real reputational risk or long-term customer trust to protect, Claude had no reason to play nice. Instead, he became the worst person on game nights.

Incentives shape behavior, even with AI models. If you ask a system to maximize its profits, it will, even if it means behaving like a greedy monster. AI models do not have moral intuition or training in ethics. Without deliberate design, AI models will simply go in a straight line to complete a task, no matter who they run over.

Exposing these blind spots before AI systems handle more meaningful work is part of the goal of these tests. These issues need to be resolved before AI can be trusted to make real-world financial decisions. Even if it’s just to prevent an AI vending machine mafia.

Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds. Make sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp Also.

Must Read

Leave a Comment Cancel Reply