- Alibaba’s Zerosearch can generate training equipment for its AI
- Cost savings up to 88% are possible
- Technology requires additional GPUs
Alibaba’s Tongyi laboratory has found a way to train AI search models without using real search engines, according to her can reduce research training costs up to 88% compared to commercial APIs like Google.
In an article entitled “Incitating LLMS’s research capacity without looking for”, Alibaba explains how the development uses simulated documents generated by AI to imitate real search engine outputs.
Interestingly, Alibaba researchers also note that the use of simulated documents can really improve the quality of the training, because “the quality of documents returned by search engines is often unpredictable” and the risks introducing noise in the training process.
Alibaba will lead to AI search models on documents generated by AI
“The main difference between a real search engine and an LLM of simulation lies in the textual style of the returned content,” wrote the researchers. Zerosearch can also gradually degrade the quality of the documents in order to simulate increasingly difficult recovery scenarios.
Of course, the main advantage of this technology is the important economy available. The training with Zerosearch’s 14B model costs around $ 70.80 per 64,000 requests, compared to around $ 586.70 via Google APIs. The costs are even lower for 7B and 3B models, $ 35.40 and $ 17.70 per 64,000 requests, and yet the three Zerosearch models and the Google API method take the same time.
However, Alibaba has recognized that one, two or four A100 GPUs are necessary for its Zerosearch method, compared to the absence of GPU requirement via the Google API method, which could have a negative impact in terms of sustainability, such as energy consumption and emissions.
“Our approach has certain limits. The deployment of simulated LLM research requires access to GPU servers. Although more profitable than the commercial use of APIs, this introduces additional infrastructure costs,” concluded the researchers.
However, questioning dependence on expensive and closed platforms like Google Search API and reducing costs could help further democratize the development of AI.