- Gemini Robotics is a new model
- He focuses on the physical world and will be used by robots
- It is visual, interactive and general
Google Gemini is good in many things that happen inside a screen, including text and generative images. However, the latest model, Google Robotics, is a model of action in vision language which moves the generator in the physical world and could considerably accelerate the race of revolution of humanoid robots.
Gemini Robotics, which Deepmind de Google unveiled on Wednesday, improves Gemini’s capacities in three key areas:
- Dexterity
- Interactivity
- Generalization
Each of these three aspects has a significant impact on the success of robotics in the workplace and unknown environments.
Generalization allows a robot to take the vast knowledge of Gemini on the world and things, to apply it to new situations and to accomplish tasks on which it has never been formed. In a video, the researchers show a pair of robot weapons controlled by Gemini Robotics, a table basketball match, and ask him to “slam basketball”.
Even if the robot had never seen the game before, it picked up the small orange ball and stuffed it through the plastic net.
Google Gemini Robotics also makes robots more interactive and capable of responding not only to the modification of verbal assignments but also to unpredictable conditions.
In another video, the researchers asked the robot to put grapes in a bowl with bananas, but they then moved the bowl while the adjusted robot arm and still managed to put the grapes in a bowl.
To watch
Google has also demonstrated the different capacities of the robot, which allowed it to tackle things like playing tick-tac-toe on a wooden board, erase a whiteboard and the folding origami paper.
Instead of training hours on each task, robots respond to the instructions of almost constant natural language and perform tasks without guidance. It’s impressive to watch.
Naturally, the addition of AI to robotics is not new.
Last year, Openai joined the AI figure to develop a humanoid robot which can develop tasks depending on the verbal instructions. As with Gemini Robotics, the visual language model in Figure 01 works with the Openai speech model to start conversations as a back and forth on tasks and the evolution of priorities.
In the demo, the humanoid robot stands in front of the dishes and a drainer. He is asked what he sees, that he lists, but then the interlocutor changes the tasks and asks for something to eat. Without missing a beat, the robot takes an apple and holds it out.

Although most of what Google has shown in the videos was disembodied the arms and hands of robots working through a wide range of physical tasks, there are larger plans. Google is associated with Apptroniks to add the new model to its Apollo Humanoid robot.
Google will connect points with additional programming, a new advanced visual language model called Gemini Robotics-Er (embodied reasoning).
Gemini Robotics-Er will improve the spatial reasoning of robotics and should help robot developers to connect models to existing controllers.
Again, this should improve reasoning on the fly and allow robots to quickly understand how to enter and use unknown objects. Google calls Gemini Rotbotics and an end solution and affirms that it “can perform all the steps necessary to control a robot as soon as the box is released, including perception, state estimate, spatial understanding, planning and code generation”.
Google provides a gemini -er robotics model with several robotics and research robotics companies, including Boston Dynamics (Makers of Atlas), Agile Robots and Agility Robots.
Overall, it is a potential boon for humanoid robotics developers. However, as most of these robots are designed for factories or in the laboratory, it can be some time before having a geminite robot in your home.




