Microsoft Unveils New AI Agent to Rival OpenAI's GPT-4o

For years, artificial intelligence has been a strategic priority for major tech companies, and Microsoft is no exception. The company has invested heavily in research, infrastructure, and specialised teams to advance AI far beyond the era of simple text-based models.

Today’s systems are expected to understand context, interact with interfaces, automate tasks, and even execute multi-step workflows—capabilities that go far beyond generating text.

OpenAI has captured global attention with powerful models such as GPT-4o, pushing the boundaries of what AI can do. But Microsoft is determined not to fall behind. As reported by Windows Central, the company has unveiled a new AI agent designed to surpass GPT-4o in certain areas—especially when it comes to running directly on local hardware rather than relying on the cloud.

Microsoft Introduces “Fara-7B”: A Local AI Agent for Complex PC Tasks

OpenAI’s GPT models have become the benchmark in the AI world, setting a high bar for performance and versatility. Microsoft, however, is steadily carving its own path. In addition to its close collaboration with OpenAI, the company has developed internal AI systems and recently formed a dedicated research group called MAI Superintelligence, whose mission is to push the next frontier of advanced AI.

The latest result of that work is Fara-7B—a new AI agent built specifically for executing complex computer tasks on behalf of the user, and most importantly, doing so entirely on-device.

According to early information shared by VentureBeat, Fara-7B is a vision- and action-driven model with over 7 billion parameters, designed to run efficiently on smaller systems. By operating locally, it offers:

Higher privacy, since no data has to be sent to the cloud
Lower latency, enabling faster response times
Safer automation, especially for workflows involving sensitive or confidential information

This positions Fara-7B as an agent that can control interfaces, perform actions, read screens, and navigate applications without compromising user data.

A Local AI Agent With Advanced Interaction Capabilities

Though still in the experimental stage, Fara-7B demonstrates abilities typically seen only in large cloud-based models:

1. Runs directly on a user’s PC

The model is optimised for consumer-level hardware, making it a practical solution for businesses and individuals who need local AI power for sensitive tasks.

2. Interacts with user interfaces like a human

Fara-7B can understand webpages and applications through:

Screenshots
Pixel-level visual data
Predicted coordinate mapping

This means it can determine where to click, scroll, or type—mimicking human interaction without needing access to internal browser code.

3. Works even with complex or non-standard webpages

Unlike many web automation agents that rely on structured HTML or accessibility APIs, Fara-7B extracts meaning from raw visual input. This allows it to work with:

Heavily scripted sites
Poorly structured code
Proprietary interfaces

Its independence from underlying code makes it far more adaptable than traditional automation tools.

Image of Microsoft's new AI agent, Fara-7B — This AI agent, Fara-7B, will work locally

Outperforming Cloud Models Like GPT-4o

One of the most striking discoveries is that Fara-7B has outperformed GPT-4o in specialised web agent benchmark tests. In a recent evaluation:

Fara-7B scored 73.5%
GPT-4o scored 65.1%

This suggests that while GPT-4o dominates many language and reasoning benchmarks, Microsoft’s new model is highly optimized for real-world action tasks, where vision, UI understanding, and step-by-step execution matter more than linguistic depth.

Microsoft also noted that Fara-7B is trained to identify “critical actions” to reduce the risk of agents making dangerous or irreversible changes during automation. This safety layer is essential for scenarios like handling financial data, accessing internal tools, or performing system-level operations.