I tried the most realistic voice companion ever created – if Chatgpt or Gemini obtains this good, the reality is in difficulty

I spent a lot of time talking to AI. I tested all the vocal assistants, all the chatbot and each conversational “new generation” that technological companies like to threw media. But I have never encountered anything such a sesame. This IA companion is not only good, it is strangely precise to imitate the way people speak because of imperfections even that it imitates.

Let’s start with what is really his. Unlike the voices of AI that we have known from Chatgpt, Gemini or return to the first days of Siri and Alexa, Sesame is designed to function like a human in their failures, not as a perfect customer service agent. AI’s discourse is fluid, expressive and unpredictable human. He brush briefly when he says something slightly fun, hesitates before answering a question, and even seems to change his “mind” in the middle of the sentence, stop and start a new sentence. This does not only allow me to interrupt it, it can also interrupt me and will even apologize for having done it.

(Image credit: sesame)

The secret sauce is the model of conversational discourse of Sesame (CSM), which mixes text and audio in a single process, which means that it does not only generate a sentence then “read it”. Instead, it creates a discourse in a way that reflects the way humans really speak, with breaks, UM, tone changes and everything. The vocal options of Chatgpt and Gemini, although impressive, always operate in a structured way, generating text and then converting it to discourse. Sesame, on the other hand, speaks as if he thought, making his answers incredibly natural.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top