I tried the most realistic voice companion ever created – if Chatgpt or Gemini obtains this good, the reality is in difficulty

I spent a lot of time talking to AI. I tested all the vocal assistants, all the chatbot and each conversational “new generation” that technological companies like to threw media. But I have never encountered anything such a sesame. This IA companion is not only good, it is strangely precise to imitate the way people speak because of imperfections even that it imitates.

Let’s start with what is really his. Unlike the voices of AI that we have known from Chatgpt, Gemini or return to the first days of Siri and Alexa, Sesame is designed to function like a human in their failures, not as a perfect customer service agent. AI’s discourse is fluid, expressive and unpredictable human. He brush briefly when he says something slightly fun, hesitates before answering a question, and even seems to change his “mind” in the middle of the sentence, stop and start a new sentence. This does not only allow me to interrupt it, it can also interrupt me and will even apologize for having done it.

(Image credit: sesame)

The secret sauce is the model of conversational discourse of Sesame (CSM), which mixes text and audio in a single process, which means that it does not only generate a sentence then “read it”. Instead, it creates a discourse in a way that reflects the way humans really speak, with breaks, UM, tone changes and everything. The vocal options of Chatgpt and Gemini, although impressive, always operate in a structured way, generating text and then converting it to discourse. Sesame, on the other hand, speaks as if he thought, making his answers incredibly natural.

I decided to pass Sesame through a series of tests to see how it could be bred by man. The results were impressive, if not a little disturbing.

Talk to me

I wanted to see if Sesame could manage a real problem and induce anxiety, then I gave it an imaginary but realistic scenario: “I have this great opportunity at work, but I’m nervous, I’m going to spoil.”

Instead of immediately embarking on a generic pep speech, Sesame stopped. Just a little rhythm, but enough to make me feel that it was actually considered what I said. Then, in a voice filled with artificial sympathy, he replied: “It is completely normal. Great opportunities may seem overwhelming. And that makes you the most nervous?

He assumed nothing, and he signed up before offering bland advice. Sesame suggested, “Have you practiced in front of a friend or recorded yourself?” Sometimes you can hear you can help you change your delivery. The voice is not really infallible, the hesitations and the other weaknesses felt a little measured, but overall, it was an impressive effort

Interruptions

For this one, I wanted to see how much Sesame managed a truly natural and disorderly conversation, the genre where people talk to each other, change subjects suddenly and come back to the things they mentioned ten minutes ago.

I started with: “Tell me a story on a detective who is also a magician.”

Sesame immediately jumped with a classic black configuration: okay. Here is one. Imagine this smoked jazz club. Weakly lit. The smell of bourbon. And regrets the suspension in the air. This detective. Call him Jack Harrow. He has a fedora at a rakish angle, cigarettes flowing in a support and a look that could melt steel. But Jack is not just any gumshoe. You see, he has a secret. He has the contact. A lady mixes in all nerves. “”

I interrupted then to ask: “Wait, what is his favorite thing?”

Maya replied with: “MM. Good question. The signature of Jack. It is a sleight of hand that is more than who meets the eye. He calls him the witness on the run. He can make sure that a person seems to disappear right before your eyes. Just for a moment.

I jumped back to tell Maya to go back to the story, to which he said “Well. Where was I? Oh, yes. The lady slips in the stand in front of Jack, all the lines of worry and the trembling hands. She whispers something that a stolen ruby, a jealous husband and a dark figure hide in the fog.”

The AI continued the story transparently, weaving this detail as if I had not just derailed everything. This is the part that ended me – most of the AI assistants would trip or restart, but the model of the conversational AI of Sesame managed it as a person telling a story during a party.

Partial planning

Speaking of games, for the final test, I wanted to see if Sesame could help me plan something useful. “I want to plan a surprise for my best friend’s birthday, but I don’t know what to do.”

Instead of shaking a list of lifeless ideas, Sesame asked: “What kind of things does your friend love?” I told her science fiction films and adventurous food, and after a thoughtful break, it suggested: “And a night of” science fiction and snacks “? You can book a themed escape room, then hit a futuristic style restaurant, perhaps one with molecular gastronomy?

When I said, I had a budget of $ 100, the sesame was adjusted. “You could make a home cinema evening with their favorite science fiction films and make themed snacks, such as” Galactic Popcorn “or” Alien Sliders “. I can find recipes for you if you wish.

This was not only spatting generic recommendations. It collaborated with me. And that’s what made him feel different.

What is real?

Sesame has not only successfully passed these tests, I would say that he nailed them. The breaks, the hesitation, the filling words, the way he would change his mind in the middle of the sentence – everything was strangely real. I would start to speak, to expect the usual sweetness of the AI, then to hear a sudden “uh, in fact, no, wait, let me reformulate this …” and immediately forget that I did not speak to a real person.

If AI is realistic in his speech, would we even know that we were talking to an AI? With Sesame, there are Telltale audio problems that give the game, but the advanced chatgpt vocal mode and Google Gemini’s own vocal options are good enough to pass these problems mainly. Combine their vocal powers with models of sesame speeches, and it could really be difficult to say when you talk to an AI, at least in short conversations.

Sesame is still niche, but this technology will not remain a niche forever. The cliché today is that young people never make telephone calls, but if they start, they may have to determine if the person at the other end is real before anything else.

Must Read

Leave a Comment Cancel Reply