UX Evals

Evaluate AI through real user experience with Outset UX evals

Taking traditional usability into the world of tokens and non-deterministic outcomes to understand real user experience.

Ideated by Microsoft's Copilot team, powered by Outset.

A researcher reviews a tablet while standing in front of a wall covered with UX wireframes and sketches

Leading research teams are using Outset to rethink how AI should be evaluated

Use UX Evals to understand AI experiences — not just outputs.

AI Experience Evaluation

Understand whether end-to-end AI interactions feel helpful, trustworthy, and decision-supportive to users.

AI Experience Evaluation

Understand whether end-to-end AI interactions feel helpful, trustworthy, and decision-supportive to users.

AI Experience Evaluation

Understand whether end-to-end AI interactions feel helpful, trustworthy, and decision-supportive to users.

Model & System Comparisons

Compare Ai systems and tools based on real user experience — not just benchmark scores.

Model & System Comparisons

Compare Ai systems and tools based on real user experience — not just benchmark scores.

Model & System Comparisons

Compare Ai systems and tools based on real user experience — not just benchmark scores.

Decision Support & Trust

Evaluate whether AI helps users make decisions, build confidence, or move forward.

Decision Support & Trust

Evaluate whether AI helps users make decisions, build confidence, or move forward.

Decision Support & Trust

Evaluate whether AI helps users make decisions, build confidence, or move forward.

Uncover Consensus Through Scale

Every AI interaction is unique. UX Evals reveal shared patterns by studying hundreds of real conversations.

Uncover Consensus Through Scale

Every AI interaction is unique. UX Evals reveal shared patterns by studying hundreds of real conversations.

Uncover Consensus Through Scale

Every AI interaction is unique. UX Evals reveal shared patterns by studying hundreds of real conversations.

The advantage of UX evals

As products move from pixels to tokens, every user’s experience is highly unique to them.

AI evals and traditional usability fall short of deriving insights from first-person, multi-modal, multi-turn interactions.

UX evals ground evaluation in how AI is actually experienced by users, across real conversations and real contexts.

A team sits around a table reviewing a presentation on a computer monitor titled “Ask the Right Questions."

UX evals are built for real-world AI use

First-person conversations

Users bring their own goals, context, and questions — not pre-written prompts.

First-person conversations

Users bring their own goals, context, and questions — not pre-written prompts.

First-person conversations

Users bring their own goals, context, and questions — not pre-written prompts.

Multi-turn evaluation

Value is assessed across the entire conversation, not a single response.

Multi-turn evaluation

Value is assessed across the entire conversation, not a single response.

Multi-turn evaluation

Value is assessed across the entire conversation, not a single response.

Real-world conditions

Prompts are messy, imperfect, and emotional — just like how humans actually operate.

Real-world conditions

Prompts are messy, imperfect, and emotional — just like how humans actually operate.

Real-world conditions

Prompts are messy, imperfect, and emotional — just like how humans actually operate.

Scaled qualitative signal

Patterns emerge by observing experience across many users, not isolated anecdotes.

Scaled qualitative signal

Patterns emerge by observing experience across many users, not isolated anecdotes.

Scaled qualitative signal

Patterns emerge by observing experience across many users, not isolated anecdotes.

Why UX evals go beyond traditional approaches

Beyond traditional AI evals

Machine evals and human graders test whether AI works against predefined criteria. UX Evals test whether users actually prefer and value the AI’s experience — as judged by the user.

Beyond usability testing

Usability testing was built for static pixels and flows. UX Evals are built for conversations — where outcomes are non-deterministic and value is subjective. You don’t “use” AI. You collaborate with it.

Resources for researchers running UX evals

White paper

Introducing UX Evals

The “why” behind the net-new methodology, written by the Microsoft team that developed it.

Jan 22, 2026

—

Christopher Monnier

Guide

Outset UX Evals: A How To Guide

A step-by-step resource for researchers looking to adopt.

Jan 22, 2026

—

Christopher Monnier

Event

From Pixels to Tokens: A UX Evals Workshop

A tangible workshop hosted by the team that developed this methodology at Microsoft Copilot on how to implement. RSVP now.

Feb 4, 2025 • 12-1pm PST (Virtual)

Find out how Outset accelerates every step of research.

Test out a demo interview to see how Outset transforms research speed, scale, and insight quality.

Two people sitting at a table having a conversation

Find out how Outset accelerates every step of research.

Test out a demo interview to see how Outset transforms research speed, scale, and insight quality.

Find out how Outset accelerates every step of research.

Test out a demo interview to see how Outset transforms research speed, scale, and insight quality.

The most advanced AI-moderated research platform

Secure by design

The most advanced AI-moderated research platform

Secure by design

The most advanced AI-moderated research platform

Secure by design

Evaluate AI through real user experience with Outset UX evals

Use UX Evals to understand AI experiences — not just outputs.

AI Experience Evaluation

AI Experience Evaluation

AI Experience Evaluation

Model & System Comparisons

Model & System Comparisons

Model & System Comparisons

Decision Support & Trust

Decision Support & Trust

Decision Support & Trust

Uncover Consensus Through Scale

Uncover Consensus Through Scale

Uncover Consensus Through Scale

The advantage of UX evals

UX evals are built for real-world AI use

First-person conversations

First-person conversations

First-person conversations

Multi-turn evaluation

Multi-turn evaluation

Multi-turn evaluation

Real-world conditions

Real-world conditions

Real-world conditions

Scaled qualitative signal

Scaled qualitative signal

Scaled qualitative signal

Why UX evals go beyond traditional approaches

Beyond traditional AI evals

Beyond usability testing

Resources for researchers running UX evals

Introducing UX Evals

Outset UX Evals: A How To Guide

From Pixels to Tokens: A UX Evals Workshop

Find out how Outset accelerates every step of research.

Find out how Outset accelerates every step of research.

Find out how Outset accelerates every step of research.

Subscribe to our newsletter

Subscribe to our newsletter

Subscribe to our newsletter