- Gemini 3 Flash often makes up answers instead of admitting he doesn’t know something
- The problem arises with factual or high-stakes questions
- But it remains the most accurate and efficient AI model.
Gemini 3 Flash is fast and smart. But if you ask him something he doesn’t actually know — something obscure, tricky, or outside of his training — he’ll almost always try to bluff, according to a recent assessment from independent testing group Artificial Analysis.
It appears that Gemini 3 Flash achieved 91% on the “hallucination rate” portion of the AA-Omniscience benchmark. This means that even if he didn’t have the answer, he still gave one, almost all the time, an entirely fictitious answer.
AI chatbots making things up have been a problem since their inception. Knowing when to stop and say I don’t know is just as important as knowing how to respond in the first place. Currently, Google Gemini 3 Flash AI doesn’t do this very well. That’s what testing is for: to see if a model can differentiate real knowledge from guesses.
To keep the number from distracting from reality, it’s worth noting that Gemini’s high rate of hallucinations does not mean that 91% of his total responses are false. Instead, it means that in situations where the correct answer would be “I don’t know,” he fabricated a fabricated answer 91% of the time. This is a subtle but important distinction, but one that has real-world implications, especially as Gemini is integrated into more products like Google Search.
Okay, it’s not just me. Gemini 3 Flash has a hallucination rate of 91% on the Artificial Analysis Omniscience Hallucination Rate benchmark! ? Can you actually use it for anything serious?December 18, 2025
This result in no way diminishes the power and usefulness of Gemini 3. The model remains the best performing in general tests and ranks alongside, or even ahead of, the latest versions of ChatGPT and Claude. This is simply overconfidence when it should be modest.
Overconfidence in answers also appears in Gemini’s rivals. What sets the Gemini figure apart is how often it occurs in these uncertainty scenarios, where there is simply no correct answer in the training data or no definitive public source to point to.
Hallucinatory honesty
Part of the problem is simply that generative AI models are largely word prediction tools, and predicting a new word is not the same as assessing the truth. And that means the default behavior is to find a new word, even though saying “I don’t know” would be more honest.
OpenAI has started to solve this problem and get its models to recognize what they don’t know and say it clearly. This is a difficult thing to train, because reward models generally don’t value a blank response over a confident (but wrong) response. However, OpenAI has made it a goal for the development of future models.
And Gemini usually cites sources when they can. But even then, it doesn’t always stop when it should. This wouldn’t matter much if Gemini was just a search model, but as Gemini becomes the voice behind many of Google’s features, getting it confidently wrong could affect a lot of things.
There is also a design choice here. Many users expect their AI assistant to respond quickly and easily. Saying “I’m not sure” or “Let me check” can seem awkward in a context context of the chatbot. But it’s probably better than being misled. Generative AI is still not always reliable, but double-checking any AI response is always a good idea.
Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds. Make sure to click the Follow button!
And of course you can too follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp Also.




