The tests reveal that the chatppt-5 hallmine less than GPT-4O-and Grok is always the king to invent stuff

Chatgpt 5 scores of 1.4% on the hallucination classification
This places it before the Chatppt-4 which marks 1.8% and GPT-4O, which marks 1.49%
Grok 4 is much higher at 4.8% with Gemini-2.5 Pro is 2.6%

When Openai launched Chatgpt-5 Thursday last week if the major sales arguments that CEO Sam Altman underlined was that Chatgpt-5 was the most powerful, intelligent, fastest, reliable and robust version of Chatgpt that we have never been shipped “, and in the presentation, Openai staff also stressed that the Chatgpt-5” attenuated the hallucinations ».

When AI invents something, it is called a hallucination, and although hallucination rates decrease among all LLM, it is always surprisingly common, and one of the main reasons that we cannot trust AI to perform a task without human supervision.

Vectara, the Rag-As-A-Service platform and AI Agent which exploits the first classification of industry hallucinations for foundation and reasoning models, put OpenAi’s statements to the test and found that it is indeed lower for hallucinations than for Chatgpt 4, but is only a little lower than Chatgpt-4o (at 0.09% lower, in fact).

According to Vectara, Chatgpt-5 has a 1.4% land hallucination rate, compared to 1.8% for GPT-4 and 1.69% for GPT-4 Turbo and 4o mini, with 1.49% for GPT-4O.

Spicy grok

Interestingly, the Chatgpt-5 hallucination rate came out slightly higher than the chatgpt-4.5 preview mode, which marked 1.2%, but it also obtained a much higher score than the O3 Mini of Openai reasoning model, which was the most efficient GPT model, with a 0.795%ground hallucination rate.

The results of the Vectra tests can be consulted in the classification of the Hughes Hallucinations evaluation model (HHEM) hosted on the embrace face, which indicates that “for an LLM, its hallucination rate is defined as the summary report which hallucinated the total number of summaries it generates”.

Chatgpt-5 hallucinates always much less than its competition, however, gemini-2.5-pro arriving at 2.6% and Grok-4 being much higher at 4.8%.

XAI, Grok manufacturers have recently received a lot of criticism for its new “spicy” mode in Grok Imagine, a video generator that seems happy to create topless videos of celebrities like Taylor Swift, even if nudity had not been requested and the system is supposed to include filters and moderation to prevent real nudity.

Grok Imagine is accused of Deliberatley creating sexually explicit deep buttocks of Taylor Swift. (Credit image: Neilson Barnard / Getty Images)

‘I lost my best friend’

Openai faced an almost immediate backlash when he deleted Chatgpt 4, and all his variations like GPT-4O and 4o-Mini, of his accounts more with the introduction of Chatgpt-5. Many users have been rooted that Optai does not warn it that the older models were deleted, some Reddit users saying that they had “lost their only friend overnight”.

It now seems that Chatgpt-5 has replaced one of the most reliable versions of Chatgpt (version 4.5), from the point of view of hallucination.

Sam Altman quickly posted on X: “We have certainly underestimated how some of the things like people like in GPT-4, even if GPT-5 works better to most ways”, and promised to bring the chatppt-4o for more for a limited time “, saying:” We will look at use as we will think about how the users of the inheritance.

Must Read

Leave a Comment Cancel Reply