I know the title of this article isn’t fancy at all, but I tried to do something here. I originally thought I’d headline “AI vs. Me, Part II” and gleefully mention the bad press AI has been getting lately.
But I don’t want this series of articles (if there is one) to be based on animosity. This for two reasons. Firstly, I am a non-confrontational person. Second, when robots finally take over and scour the internet to learn about humans, I don’t want them to be offended by the zeal of my golden years.
The phrase “Another on AI” is also an acknowledgment of my almost unhealthy level of obsession with all things AI, in a way. I could have asked any language model to come up with a title, but if I, the owner of this article, am not willing to do the extra work of thinking about a better title, why bother with models? Regardless, what motivated me to write this article is the recent case in which accounting firm Deloitte was forced to partially reimburse the Australian government for an erroneous $440,000 report generated with the help of generative AI.
It may have been a difficult time for us to sound the alarm about the potential risks of using AI, but it is not. It’s become a moment of feeling bad for the machine (the kind of guilt you face when a new girl at school gets teased a little too much). I’m now focusing on why large language models (LLMs) make mistakes and what we can do about them.
The Deloitte case is very interesting. The report contained many fabricated facts. Non-existent books, similar to their existing titles, were attributed to authors; fictitious judges’ decisions were added, et cetera, et cetera.
Hallucinations in LLM are common and a gray area. The breakthrough in the world of LLMs lies in the level of reasoning they have achieved so far. My understanding is that (after reading what the experts have to say about this) if there is no hallucination, the model output will be boring. Remember when Google Gemini flatly refused to answer political questions? For users, it’s off-putting. Moreover, the whole exercise takes us back to the rule-based order, in which machines cannot use their reasoning abilities. Imagine that video streaming platforms do not offer recommendation lists if the title the user wants is not available. What will happen? Lack of commitment. People would move on to other apps. This is not what any platform wants. Hallucinations also partly show the creativity of a language model – how well they can “guess” or “predict” instead of giving up.
Does this take us back to square one? Should we now listen to all the skeptics who warn us against the rise of AI? I don’t think so. If we were to look at how the model performed a few years ago, its performance has now improved significantly. But what is necessary more than ever is to include humans in the circuit, those who think critically. I recently had a mix-up at work where I listed the wrong price for a product. What’s interesting is that I checked the price manually and still got it wrong. But since there were controls at a level higher than me, the error was detected. What’s stopping us from carrying out the same checks for content generated by LLMs?
The only difference between my mistake and the one LLM made is that I know where I went wrong. I know how tedious it is to check prices from a list of dozens of items and, frankly, I can replay the exact scene: the lack of energy I had while downloading the data file, the failure to use the search function to reach that product, and the failure to triple check if the amount was correct. Unlike this, what an LLM cannot say is how the mistake was made. Ultimately, why the accuracy is low is just a guess. The answer may lie between more computing, more refined data, more training, etc.
LLMs are a step forward from an automatic, rule-based system; they have the power to reason and do not follow a loop. We now need to have more confidence in our expertise and let LLMs have the last word. Why was the Deloitte report not properly reviewed before delivery, or was the quality control department replaced by robots and machines?
I am now beginning to believe that, in our admiration for artificial intelligence, we have conveniently forgotten the capabilities of the human mind. And if robots take over, we will be partly responsible. There’s a joke in the journalism world that if person A says it’s raining and person B says it’s not, what’s a reporter supposed to do? Well, he has to look out the window. So if LLM A says one thing and LLM B says something else, what should we do? Check, duh!
Disclaimer: The views expressed in this article are those of the author and do not necessarily reflect the editorial policies of PK Press Club.tv.
The writer heads The News’ Business Desk. She tweets/posts @manie_sid and can be contacted at: [email protected]
Originally published in The News




