- OpenAI officially launched its first AI agent: Operator
- It works in a web browser to perform tasks for you and is now available as a limited search preview.
- The operator can make a dinner reservation, fill out a form, and perform other web tasks
OpenAI is always looking for the next big thing to add to ChatGPT, and after months of rumors, including a report earlier this week announcing a launch, the tech giant’s first AI agent is here. Operator is designed to do web tasks for you, all with the press of a button.
Essentially, Operator is a Computer Using Agent (CUA) that uses GPT-4o’s visual skills to browse and search the web. This means that it can understand the context of what to search for and, thanks to its multimodality, it understands what it sees during its search. It is now available as a search preview for ChatGPT Pro subscribers in the United States.
The operator is described as “an agent that can use its own browser to perform tasks for you.” OpenAI has released a demo showing the Operator browsing the web like we (that is, us humans) do. You can ask the operator to book a dinner reservation for you, fill out a very long form, order groceries from a service, or even book a flight. It can use OpenTable to find and book a restaurant reservation, as shown in the demo, the operator will even walk you through its steps.
Look on it
Operator is a “search preview,” so please be aware that it is still in its early stages. OpenAI imposes certain limits. We haven’t had a chance to visit there yet, but it certainly looks impressive. This is OpenAI’s first entry into the world of AI agents, which is likely to be the theme of the year in the field of artificial intelligence.
OpenAI writes in a blog post announcing Operator that it “is one of our first agents, which are AIs that can work for you independently: you give it a task and it will carry it out.” This suggests that not only are more agents in the works – Altman confirmed this during the live demo – but that they are all based around the notion of doing things for you – an important step in the quest to to make AI even more useful. giving us back time.
Operator is powered by the new Computer Using Agent (CUA) model, which combines GPT4o’s visual skills with advanced reasoning. All of this comes together to allow the operator to understand and use the elements of a browser: the search bar, various buttons and the content on the screen.
OpenAI explains that “the operator can ‘see’ (via screenshots) and ‘interact’ (using all mouse and keyboard actions) with a browser”, allowing them to functionally use a browser to accomplish a task. This is pretty good, especially if it works with a high success rate, and according to the blog post, it can self-correct.
However, as with most new AI tools and skills, it will likely take some time for these to actually become useful in the real world. This will also require OpenAI to open it up to more people, although as a first look at the research it’s still an impressive demo.
For now, if you’re in the US and subscribed to ChatGPT Pro, you can try it on the OpenAI website. OpenAI CEO Sam Altman said it would eventually come to other countries and be added to the ChatGPT Plus subscription. As we recalled from some 12 Days of OpenAI announcements, Europe will probably take a little longer.