We all know the story of the first YouTube video, a 19 -second grainy clip from the co -founder Jawed Karim at the zoo, pointing out the elephants behind him. This video was a pivotal moment in the digital space, and in some respects, it is a reflection, or at least a reverse mirror image, today when we digest the arrival of Veo 3.
A game of Google Gemini, Veo 3 was unveiled at Google I / O 2025 and is the first generative video platform which can, with a single invite, generate a video with a synchronized dialogue, sound effects and background noises. Most of these 8 -second clips arrive in less than 5 minutes after entering the prompt.
I have been playing with Veo 3 for a few days, and for my last challenge, I tried to return to the start of the social video and that Youtube “Me at the Zoo”. More specifically, I wondered if Veo 3 could recreate this video.
As I wrote, the key to a good VEO 3 result is the prompt. Without detail or structure, Veo 3 tends to make the choices for you, and you generally do not end up with what you want. For this experience, I wondered how I could describe all the details that I wanted to derive from this short video and deliver them to Veo 3 in the form of an prompt. So, of course, I turned to another AI.
Google Gemini 2.5 Pro is currently not able to analyze an URL, but Google AI, the brand new form of research that is spread quickly in the United States, is.
Here is the prompt that I filed in Google’s AI mode:
The Google AI mode has returned instantly instantly with a detailed description, which I took and which I deposited in the rapid Gemini Veo 3 field.
I did an editing editing, mainly deleting sentences like “the video appears …” and the final analysis at the end, but if not, I left most and I added this at the top of the prompt:
“Let’s make a video based on these details. The output must be a 4: 3 report and seem to have been turned on an 8 mm video strip.”
It took a while at Veo 3 to generate the video (I think the service is hammered at the moment), and, because it only creates pieces of 8 seconds at a time, it was incomplete, cutting the dialogue in the middle of the sentence.
However, the result is impressive. I wouldn’t say the main character looks like Karim. To be fair, the prompt does not describe, for example, Karim’s haircut, the shape of his face or deep eyes. The description by Google of its outfit by Google was also insufficient. I’m sure it would have done a better job if I had fueled it with a screenshot of the original video.
Note to yourself: You can never offer enough details in a generative prompt.
8 seconds both
The zoo of the VEO 3 video is more pleasant than the one Karim visited, and the elephants are much more distant, although they are in motion there.
VEO 3 has obtained the quality of the film, giving it a pretty look from 2005, but not the ratio of appearance 4: 3. It also added archaic and useless labels at the top which fortunately disappear quickly. I now realize that I should have deleted the “title” bit of my prompt.
The audio is particularly good. The dialogue synchronizes well with my main character and, if you listen carefully, you will also hear the background noises.
The biggest problem is that it was only half of the brief YouTube video. I wanted a complete recreation, so I decided to return with a much shorter prompt:
Continue with the same video and add it by watching the elephants, then looking at the camera when he says this dialogue:
“Fronts and it’s cool.” “And that’s about everything there is to say.”
Veo 3 conforms to the adjustment and the main character, but lost part of the plot, dropping the old -fashioned grainy video of the first clip generated. This means that when I present them together (as I do above), we lose considerable continuity. It’s like a time jump for the shooting team, where they suddenly obtained a much better camera.
I am also a little frustrated that all my VEO 3 videos have absurd legends. I must remember to ask Veo 3 to delete, hide them or put them outside the video frame.
I think how difficult it was for Karim to film, edit and download this first short video and how I just made the same clip without needing people, lighting, microphones, cameras or elephants. I did not have to transfer images from a band or even an iPhone. I just won it with an algorithm. We really crossed the glass, my friends.
I learned another thing through this project. As a Google AI Pro member, I have two video generations Veo 3 per day. It means that I can start again tomorrow. Let me know in the comments of what you would like me to create.