LLMs Fail to Match Risk-Adjusting Specialized AI Trading Bots

AI-driven trading hasn’t yet reached the “iPhone moment,” where everyone carries an algorithmic, reinforcement-learning portfolio manager in their pocket, but something like that is coming, experts say.

In fact, the power of AI is matched by the dynamic and conflicting arena of business markets. Unlike an AI agent informed by endless circuits of self-driving cars learning to accurately recognize traffic lights, no amount of data or modeling will ever be able to predict the future.

This makes refining AI trading models a complex and demanding process. Measuring success generally involves measuring profit and loss (P&L). But advances in algorithm customization result in agents who continually learn to balance risk and reward in the face of a multitude of market conditions.

Allowing risk-adjusted metrics, such as the Sharpe ratio, to inform the learning process multiplies the sophistication of a test, said Michael Sena, director of marketing at Recall Labs, a company that has run about 20 AI trading arenas, where a community submits AI trading agents, and those agents compete over a four- or five-day period.

“When it comes to analyzing the market for alpha, the next generation of builders is exploring customization and specialization of algorithms, taking into account user preferences,” Sena said in an interview. “Being optimized for a particular ratio and not just the gross income statement is more like how major financial institutions operate in traditional markets. So looking at things like, what is your maximum drawdown, what was the risk of your value to achieve that income statement?”

Taking a step back, a recent trading competition on decentralized exchange Hyperliquid, involving several major language models (LLMs), such as GPT-5, DeepSeek, and Gemini Pro, has somewhat set the benchmark for AI’s place in the trading world. These LLMs all received the same impetus and were run autonomously, making decisions. But they weren’t that good, according to Sena, barely outperforming the market.

“We took the AI ​​models used in the Hyperliquid competition and let people submit their trading agents that they had built to compete against those models. We wanted to see if the trading agents are better than the fundamental models, with this added specialization,” Sena said.

The first three places in the Recall competition were taken by custom models. “Some models were unprofitable and underperforming, but it became clear that specialized sales agents who take these models and apply additional logic and inference and data sources and other things on top, outperform basic AI,” he said.

The democratization of AI-driven trading raises interesting questions about whether there will be alpha left to hedge if everyone uses the same level of sophisticated machine learning technology.

“If everyone uses the same agent and that agent executes the same strategy for everyone, does that collapse within itself? » said Sena. “Is the alpha he detects disappearing because he’s trying to run it on a massive scale for everyone?”

That’s why those best positioned to benefit from the advantage that AI trading will eventually bring are those with the resources to invest in developing custom tools, Sena said. As in traditional finance, the highest quality tools that generate the most alpha are generally not public, he added.

“People want to keep these tools as private as possible, because they want to protect this alpha,” Sena said. “They paid a lot for this. You saw it with hedge funds buying data sets. You can see it with the proprietary algorithms developed by family offices.

“I think the magic sweet spot will be where there is a product that is a portfolio manager, but where the user still has a say in their strategy. They can say, ‘This is how I like to trade and here are my settings, let’s implement something similar, but make it better.’

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top