- Hugging Face started an AI tool to browse the web on your behalf
- The open IT agent uses a real web browser to perform tasks like getting routes or booking tickets
- The agent and his open source demo can see what is on the screen, click on the buttons, fill the forms and move step by step through tasks like a human
Hugging Face presented his own vision of the growing number of semi-independent AI agents who can do online races for people. The new and free open computer agent (if limited) is like having a personal assistant living in your web browser.
Part of the “Smolagents” initiative during the company, the open IT agent can engage with websites and applications like you, managing an invisible mouse and keyboard to respond to requests. The AI can open a browser, type things in the forms, click on the buttons, etc. Ask him to find instructions, and he will go on Google Maps, enter the origin and destination, and show you the road as a dedicated digital driver.
You can try it yourself with the live demo. Right warning, its popularity leads to certain delays and errors due to a backlog.
We are launching a computer use in smolagents! 🥳-> As vision models become more capable, they become capable of fueling complex agent workflows. In particular the Qwen-VL models, which support integrated landing, that is to say the ability to locate any element of an image by its contact details, therefore… pic.twitter.com/mi8muwzkisMay 6, 2025
AI agent
The open IT agent is a different philosophy of an idea that has led to similar tools such as the Openai operator, the use of the browser, the proxy 1.0 and the operator of Opera. Like these tools, the Hugging Face agent AI consists of being an active participant instead of a passive source of information.
As the use of the browser, the open-source IT agent is open-source, which means that anyone can see how it works and build above it, or at least modify it for the use of niche. The agent is the beginning of something more flexible, not a finished product with a million clause of legal non-responsibility. It also means that the demo is exactly that, a demonstration, not a polished package. He may be wrong and force you to jump for Captcha connections and tests.
Reservation of tickets, checking store hours, research, searching for management and clicking on menus are all things that many people would like to be able to do with a single invite in natural language. It’s one thing to ask Chatgpt how to find cheap flights. This is another to look at a tool to go to a travel website, scroll the lists and try to click on “book now”.
It can be imperfect and far from being flashy, but the open computer agent represents an approach to AI which could become as common as image generators now omnipresent.