Community Blog

Resource Center

Newsroom

Product

Technical Tuesday: what Screen Agent’s #1 OSWorld ranking means for UI automation in the real world

January 13, 2026

Technical Tuesday: what Screen Agent’s #1 OSWorld ranking means for UI automation in the real world

Authored by:

Cosmin Voicu

person in orange sweater looking at laptop with hands on keyboard

Summarize:

UiPath Screen Agent (powered by Claude Opus 4.5) was recently ranked #1 on the OSWorld-Verified benchmark, an independent evaluation conducted by the OSWorld research group.

For organizations automating critical business processes, this ranking matters because OSWorld measures how reliably an AI agent can interact with real software interfaces—interpreting on-screen content, handling UI changes, and recovering from interruptions. A top ranking means higher precision and reliability of agent performance and more confidence moving agentic UI automation into production.

Agentic UI automation is a type of automation in which AI agents interpret natural language intent and autonomously execute given tasks through user interfaces. These agents interact with applications the same way a person would: navigating screens, clicking buttons, entering data, opening documents, and adapting when the interface changes.

This OSWorld result places UiPath at the forefront of agentic UI automation and, more importantly, expands what our customers can achieve with enterprise-grade computer use capabilities.

UiPath Screen Agent ranked number one OSWorld-Verified December 2025 screenshot

In this blog post, we’ll take a closer look at the technology behind this achievement and how you can start using it to benefit from agentic UI automation that is reliable, scalable, and ready for real-world enterprise scenarios.

Pushing the boundaries of UI automation with agentic innovation

First, let’s understand where Screen Agent is being used in our technology stack: Screen Agent is the agentic framework that powers UiPath ScreenPlay, an activity in UiPath Studio that turns natural language instructions into 'self-driving' UI automations. You can simply describe your task in natural language, and ScreenPlay autonomously executes it across interfaces.

Under the hood, Screen Agent is the “brain” behind that self-driving automation. It sees what’s on the screen, reasons about intent, plans actions, and adapts to unexpected changes as it runs. Screen Agent blends UiPath technology with industry-leading large language models (LLMs), offering the flexibility to choose between Screen Agent variants powered by different underlying models from Google, OpenAI, or Anthropic.

Back in September 2025, Screen Agent (powered by OpenAI GPT-5), debuted at #2 on the same OSWorld-Verified benchmark. Since then, we’ve continued to invest in advancing our agentic UI automation technology.

In the most recent update, Screen Agent (powered by Claude Opus 4.5) achieved a 67.1% performance score, ranking #1 overall. It outperformed both general-purpose and specialized computer use models, as well as other agentic frameworks evaluated in the benchmark.

Notably, this result was achieved using agentic UI automation alone, without relying on additional code-based actions. This underscores the real-word readiness of our technology to support complex, unattended UI-based scenarios at enterprise scale.

A layered approach to reliable UI automation

At a technical level, Screen Agent uses a layered architecture that separates reasoning from execution, enabling precise and reliable interaction with user interfaces:

Planner: interprets user intent and translates it into high-level action sequences, continuously monitoring changes in the computer environment
Targeter: composed of a grounder and AI Computer Vision, it resolves pixel-accurate coordinates before executing actions on the interface

This layered design enables Screen Agent to interact with user interfaces precisely and consistently, even as screens change and unexpected changes occur.

While the OSWorld result demonstrates Screen Agent’s strength as a broadly capable computer-use agent, it is designed and optimized specifically for unattended UI automation. We are also continuously evaluating its capabilities using our own UI-CUBE benchmark, which focuses on enterprise UI scenarios.

What you can automate now that wasn’t possible before

ScreenPlay extends Screen Agent technology by also incorporating our AI-powered DOM extraction engine, which provides additional headroom in real-world deployments. By combining advanced screen understanding with DOM extraction, ScreenPlay delivers the accuracy, resilience, and precision required for enterprise UI automation.

With this unique combination of agentic AI and UiPath automation technology, ScreenPlay expands the scope of what organizations can automate, including scenarios that were previously out of reach.

For example, consider a use case where you need to extract data from multiple different websites to perform SOC 3 compliance checks for new vendors. Each vendor publishes its SOC 3 report in a different location on the website, uses different document formats, and often embeds the information deep inside long PDF files. Automating this process with a traditional approach would require building and maintaining a dedicated RPA automation with custom logic for each vendor, making it difficult to scale.

With ScreenPlay, you can describe this process once in natural language and reuse it across vendors by simply changing the company name in an input variable. ScreenPlay dynamically searches each vendor’s website, locates the correct SOC 3 document, opens it, and extracts the required compliance information. It adapts its actions based on the structure of each site and document, whether that means navigating different pages, using application-specific tools, or scanning long reports when standard search is insufficient.

Watch it in action:

UiPath ScreenPlay automates compliance checks in minutes demo

Moving agentic UI automation into production

The #1 ranking on OSWorld-Verified represents a significant technical milestone, but its real significance lies in what it enables: enterprise-ready agentic UI automation that customers can deploy today.

With Screen Agent powering ScreenPlay, UiPath is bringing state-of-the-art computer use agents out of the lab and into production, helping organizations automate more complex processes with greater resilience and less manual effort.

Learn more about UiPath ScreenPlay and how to start using ScreenPlay today.

This blog post was co-authored by Bogdan Sultana.

Topics:

Agentic Technical Tuesday AI Computer Vision Artificial Intelligence (AI)UI Automation

Cosmin Voicu

Principal Product Manager, UiPath