Interaction Design for AI Systems
Designing intuitive interfaces for unpredictable AI behavior.
Every interaction design principle I learned over the past decade assumed something that is no longer true: that the system behaves deterministically. Click this button, get that result. Every time. Same input, same output.
AI systems broke this contract. Ask ChatGPT the same question twice and you’ll get different answers. Ask Claude to review your code and it might catch a critical bug or miss it entirely depending on factors nobody can fully explain. GitHub Copilot will autocomplete your function brilliantly one moment and suggest something nonsensical the next. Perplexity will cite sources that perfectly answer your question or hallucinate references that don’t exist.
This is a fundamentally new interaction design challenge. We’re designing interfaces for systems whose output is probabilistic, whose capabilities are hard to communicate, whose errors are indistinguishable from their successes in form, and whose behavior varies in ways that even their creators can’t fully predict. The entire field of interaction design is being rebuilt around this reality, and most of what we’ve figured out so far is incomplete.
I want to walk through what we know, what we’re still getting wrong, and what the emerging patterns look like as of early 2026.
The core tension: power vs. predictability
Traditional software trades power for predictability. A calculator gives you exact answers. A spreadsheet applies formulas deterministically. A database returns the same query results consistently. Users build mental models of these systems because the systems behave reliably.
AI systems trade predictability for power. A language model can draft a legal brief, debug Python code, explain quantum mechanics to a five-year-old, and write a sonnet. No deterministic system can do all of these things. But the cost of that power is that the user never knows, with certainty, whether the output is correct.
This creates a trust calibration problem that doesn’t exist with deterministic software. With a calculator, trust is binary: either the calculator works or it’s broken. With an AI assistant, trust is a spectrum: the system works for some tasks, fails at others, and the boundary between success and failure is fuzzy, context-dependent, and constantly shifting as models improve.
The interaction design challenge is to help users navigate this spectrum. How do you communicate what the system can and can’t do? How do you make errors visible without undermining confidence in the system’s capabilities? How do you help users develop accurate intuitions about when to trust the output and when to verify it?
These are the questions every AI product team is wrestling with. Nobody has fully answered them yet. But some patterns are emerging.
The autonomy spectrum
The most useful framework I’ve encountered for thinking about AI interaction design is the autonomy spectrum. It maps the range of relationships between a human user and an AI system:
| Level | Description | Example | User role |
|---|---|---|---|
| Tool | AI executes specific, bounded tasks on explicit command | Grammarly checking spelling | Direct operator |
| Assistant | AI suggests; human decides and acts | Copilot suggesting code completions | Reviewer and approver |
| Collaborator | AI and human work together with shared initiative | ChatGPT Canvas for iterative writing | Co-creator |
| Delegate | Human sets goals; AI plans and executes with check-ins | AI agent booking travel based on preferences | Goal-setter and supervisor |
| Autonomous agent | AI operates independently within guardrails | Automated customer service resolution | Monitor and exception handler |
The interaction design patterns are completely different at each level. A tool needs clear inputs and predictable outputs. An assistant needs good suggestion presentation and easy accept/reject mechanics. A collaborator needs shared state and turn-taking protocols. A delegate needs goal specification, progress visibility, and intervention mechanisms. An autonomous agent needs monitoring dashboards, alert systems, and kill switches.
Most AI products in 2026 operate somewhere between “assistant” and “collaborator.” The agentic AI push is moving products toward “delegate” and “autonomous agent,” but the interaction design patterns for those levels are still immature. We know how to design for tools and assistants. We’re still figuring out how to design for delegation.
The delegation design problem
The shift from “I tell the AI what to do step by step” to “I tell the AI what I want and it figures out how” is a fundamental change in interaction design. It’s the difference between driving a car (direct control) and telling a taxi driver where you want to go (delegation with oversight).
Designing for delegation requires solving several problems simultaneously:
1. Intent capture: How does the user express what they want
at the right level of specificity?
2. Plan visibility: How does the user understand what the AI
is about to do before it does it?
3. Progress monitoring: How does the user track what the AI
is doing during execution?
4. Intervention: How does the user stop, redirect, or correct
the AI mid-task?
5. Result verification: How does the user assess the quality
of the output?
Current products handle these with varying degrees of sophistication. Claude’s artifacts feature shows the AI’s work product alongside the conversation, making result verification easier. ChatGPT’s Canvas provides a collaborative editing space where the user can see and modify the AI’s output in real time. GitHub Copilot Workspace lets developers review and modify an AI-generated implementation plan before code is written. Each of these is a partial solution to the delegation design problem.
Conversational UX: harder than it looks
The conversational interface (a text input, a stream of messages) has become the default interaction pattern for AI products. It’s intuitive, familiar, and maps naturally to the turn-taking structure of human conversation. It’s also, for many use cases, the wrong interface.
Where conversation works
Conversational UI works well for:
- Open-ended exploration: “Help me think through my pricing strategy”
- Question answering: “What’s the difference between TCP and UDP?”
- Iterative refinement: “That’s close, but make it more concise and add examples”
- Contextual assistance: Chat sidebars in IDEs, document editors, design tools
In these cases, the conversational structure matches the cognitive structure of the task. The user doesn’t know exactly what they want. They need to think through it, and the back-and-forth of conversation supports that thinking.
Where conversation fails
Conversational UI fails for:
- Structured input: Configuring complex preferences, setting up workflows, specifying precise parameters
- Visual tasks: Comparing layouts, adjusting designs, working with spatial relationships
- Repeated operations: Tasks the user does frequently and wants to accomplish quickly
- Multi-step workflows: Complex processes with dependencies, branching logic, and state management
For these tasks, a conversational interface adds friction where traditional UI patterns (forms, direct manipulation, drag-and-drop, visual editors) would be more efficient. The temptation to make everything conversational because “AI is conversational” leads to products where users type long natural language descriptions of things they could accomplish with three clicks.
The best AI products in 2026 are hybrid: conversational when conversation adds value, direct manipulation when it doesn’t. Notion AI is a good example. You can ask it to summarize a page (conversational), or you can click a button to translate a block (direct). The AI capabilities are surfaced through whichever interface is most appropriate for the specific task.
The blank prompt problem
The single biggest UX challenge in conversational AI is the blank prompt. A user opens ChatGPT, Claude, or any chat-based AI product, and sees an empty text field. The implicit question is: “What do you want?”
This is paradoxically harder to answer than it should be. The AI can do thousands of things. The user needs to figure out which of those things they want right now, and then express it in natural language at the right level of specificity. For experienced users, this is fine. For new users, it’s paralyzing.
Products are addressing this in different ways:
| Product | Approach to blank prompt | Design mechanism |
|---|---|---|
| ChatGPT | Suggested prompts, GPT store for specific use cases | Reduces option paralysis by showing examples |
| Claude | Project-based workflows with custom instructions | Narrows scope to a specific context |
| Perplexity | Search-first interface with follow-up conversation | Anchors interaction around a specific query |
| Copilot (VS Code) | Inline suggestions triggered by code context | Eliminates the blank prompt entirely; AI initiates |
| Notion AI | Contextual AI actions attached to content blocks | AI actions are specific and visible |
The most effective pattern I’ve seen is contextual triggering: the AI appears where the user is already working, with capabilities relevant to the current task, rather than requiring the user to context-switch to a separate chat window and describe what they need from scratch.
Designing for uncertainty
The hardest design problem in AI products is communicating uncertainty. Traditional software is certain: the file is saved or it isn’t. The search returned 47 results or 0. The transaction succeeded or failed.
AI systems live in the space between certain and uncertain. The model is 85% confident in its answer, but the user sees no indication of this. The response looks exactly the same whether the model is highly confident or barely guessing.
Confidence indicators
Some products have experimented with explicit confidence indicators: numerical scores, color-coded certainty bands, or verbal hedges (“I’m not sure about this, but…”). The research is mixed on their effectiveness.
| Approach | Pros | Cons |
|---|---|---|
| Numerical confidence (“85% confident”) | Precise, machine-readable | Users don’t know how to calibrate; 85% confident in what? |
| Verbal hedging (“I think…”, “I’m not sure…”) | Natural, intuitive | Can undermine trust even when the answer is correct |
| Source attribution (citations, links) | Lets user verify independently | Only works for factual claims, not creative or analytical output |
| Visual indicators (confidence bars, color coding) | Scannable, non-intrusive | Users learn to ignore them if they don’t match experience |
| Showing alternatives (“Here are three possible approaches…”) | Acknowledges uncertainty constructively | Increases cognitive load; user must evaluate options |
In practice, the most effective approach I’ve seen is not a single confidence indicator but a combination: verbal hedging for uncertain claims, source attribution for factual claims, and explicit alternatives when the AI recognizes genuine ambiguity. Claude does this relatively well, often saying “there are a few ways to approach this” when the task is genuinely open-ended rather than presenting a single answer with false confidence.
Error presentation
When an AI system makes a mistake, the user needs to recognize it as a mistake. This is the fundamental design challenge: AI errors don’t look like errors. They look like correct outputs.
A traditional software error has a distinct visual presentation: red text, error icons, modal dialogs. The system itself knows something went wrong and communicates that to the user. AI systems often don’t know they’re wrong. A hallucinated citation looks exactly like a real citation. An incorrect code suggestion compiles just fine. A flawed analysis reads confidently.
The design patterns for this are still evolving, but the approaches that seem most promising:
Make verification easy, not mandatory. Don’t force users to verify every output (they won’t). Instead, make it trivially easy to verify when they choose to. Perplexity’s inline citations let users click to see the source. Copilot’s code suggestions can be diffed against the original. Claude’s artifacts can be copied into a real environment and tested.
Design for graceful failure. When the AI gets something wrong, the cost of the error should be low. Auto-saved drafts. Undo mechanisms. Version history. Suggestion mode rather than direct editing. These aren’t novel interaction patterns, but they’re critical when the system’s output is unreliable.
Encourage skepticism without discouraging use. This is the tightrope. If users trust everything the AI says, they’ll eventually get burned by a hallucination or an error. If users distrust everything the AI says, they’ll manually verify every output and the AI provides no time savings. The goal is calibrated trust: users trust the AI for tasks where it’s reliable and verify for tasks where it’s not.
The best current approach to building calibrated trust: let users learn through experience in low-stakes situations. Copilot builds calibrated trust because developers see its suggestions in context, can immediately test them, and learn over time which types of suggestions are reliable and which need scrutiny.
The uncanny valley of AI assistance
Masahiro Mori’s uncanny valley, originally about humanoid robots, has a direct analogue in AI assistance. There’s a comfort zone where AI behavior is clearly tool-like (predictable, bounded, mechanical). There’s another comfort zone where AI behavior is clearly human-like (thoughtful, contextual, nuanced). In between is a valley of discomfort: AI that’s almost human-like but not quite, that has human-like capabilities in some dimensions and inhuman limitations in others.
Current AI assistants live in this valley. They write fluent prose but can’t reliably count the letters in a word. They generate sophisticated code but sometimes produce functions that call themselves infinitely. They demonstrate apparent understanding of complex topics but have no persistent memory of previous conversations (in most implementations).
These inconsistencies create a specific kind of disorientation. The user’s mental model oscillates between “this is a tool” and “this is a collaborator,” and the mismatch between expectation and reality is jarring.
Design strategies for the uncanny valley
Strategy 1: Be explicit about what the system is.
Don’t pretend the AI is human. Don’t pretend it’s a simple tool either. Be honest about what it is: a probabilistic text generation system that is remarkably capable in many domains and unreliable in specific, sometimes surprising ways. Products that try to maintain the illusion of human-level intelligence set users up for disappointment when the illusion breaks.
Strategy 2: Set expectations through onboarding.
First interactions shape mental models. If a user’s first experience with an AI assistant is a task where the AI excels, they’ll form an overly optimistic model. If it’s a task where the AI fails, they’ll form an overly pessimistic model. Deliberate onboarding that shows both strengths and limitations helps users build accurate expectations.
Strategy 3: Fail clearly, not gradually.
The worst kind of AI failure is the gradual degradation: the output starts strong and slowly becomes less accurate, with no clear signal of the transition. This trains users to trust the first part of a response and not verify the rest, which is exactly backward (AI models often become less reliable in longer outputs as they drift further from the grounding context).
Design patterns that help: paragraph-level confidence signals, clear section breaks in long outputs, and explicit markers when the system is speculating versus stating established facts.
Progressive disclosure of AI capabilities
Traditional progressive disclosure shows simple features first and reveals advanced features on demand. AI products need a different kind of progressive disclosure: revealing what the AI can do gradually as users develop the skills and mental models to use those capabilities effectively.
The challenge is that AI capabilities are vast and non-obvious. A user who discovers ChatGPT for writing emails might not realize it can also debug code, analyze spreadsheets, generate images, and simulate conversations. A user who uses Copilot for code completion might not realize it can explain legacy codebases, generate unit tests, or refactor entire modules.
The emerging patterns for progressive disclosure in AI:
Level 1: Guided tasks
Surface specific, well-defined capabilities through UI elements: buttons, menu items, contextual suggestions. “Summarize this document.” “Translate this paragraph.” “Explain this code.” These are safe entry points because the task is clear, the scope is bounded, and the user can easily evaluate the output.
Level 2: Templates and prompts
Provide pre-built prompts and templates for common use cases. “Draft a product requirements document using this template.” “Analyze this data using these specific dimensions.” Templates reduce the blank prompt problem and show users what’s possible without requiring them to invent the use case.
Level 3: Custom workflows
Let users build their own AI workflows by combining capabilities. Claude’s Projects feature, ChatGPT’s Custom GPTs, and similar features let power users create tailored AI experiences for specific recurring tasks. This level requires users who already have a sophisticated mental model of what the AI can do.
Level 4: Autonomous delegation
Give users the ability to delegate complex, multi-step tasks. “Research these five competitors and produce a comparison matrix with pricing, features, and market positioning.” This level requires the highest trust and the most sophisticated interaction design, because the user needs to specify intent clearly, monitor progress, and evaluate a complex output.
Specific products, specific lessons
Let me look at specific AI products and what their interaction design choices reveal about the state of the art.
ChatGPT: the conversational default
ChatGPT established the conversational interface as the default for AI interaction. Its design choices reflect a philosophy of maximum flexibility: one text input, infinite possibilities. The Canvas feature (launched 2024) was a significant evolution, adding a collaborative editing space alongside the conversation for writing and code tasks.
What works: The conversational interface is genuinely intuitive for exploration and iteration. Canvas solved the “I need to work on a specific artifact, not have a conversation about it” problem. Custom GPTs provide a way to narrow the scope and reduce the blank prompt problem.
What doesn’t: The conversation metaphor breaks down for complex tasks that need structured input, state management, or multi-step workflows. Long conversations lose coherence as the context window fills. The model doesn’t effectively communicate when it’s less confident or operating at the edge of its capabilities.
Claude: the artifact-first approach
Claude’s key design innovation is the artifact system: when the AI produces a substantial piece of work (code, documents, diagrams), it appears in a separate panel rather than inline in the conversation. This separates the conversation (process, discussion, iteration) from the output (deliverable, artifact, result).
What works: The artifact separation mirrors how humans actually work: you discuss what to build in one space and build it in another. Projects with system prompts let users create persistent context for recurring tasks. The model is relatively good at expressing uncertainty and acknowledging limitations.
What doesn’t: The artifact system can be confusing for simple tasks where the overhead of a separate panel isn’t needed. The system prompt / projects abstraction requires sophisticated understanding of how to configure AI behavior.
GitHub Copilot: the embedded assistant
Copilot’s design philosophy is fundamentally different from chat-based AI: the AI is embedded directly in the user’s workspace, triggered by context rather than explicit requests. Code suggestions appear inline. Chat is available but secondary to the inline experience.
What works: Contextual triggering eliminates the blank prompt problem entirely. The suggestion appears where the user is already working, in the format they need, without a mode switch. The accept/reject interaction (Tab to accept, keep typing to reject) is brilliantly low-friction.
What doesn’t: The inline suggestion model works for code completion but struggles with larger tasks (refactoring, architecture decisions, debugging) that require conversation and context-sharing. The disconnect between inline Copilot (fast, contextual, limited) and Copilot Chat (slow, conversational, powerful) creates a fragmented experience.
Perplexity: the search-first model
Perplexity treats AI interaction as an extension of search rather than conversation. The primary input is a query, not a prompt. The primary output is a structured answer with citations, not a conversational response. Follow-up questions refine the search rather than continuing a conversation.
What works: Source attribution is built into every response, making verification trivially easy. The structured answer format (headings, bullet points, citations) is more scannable than conversational prose. The search metaphor sets appropriate expectations about the type of output users will receive.
What doesn’t: The search-first model limits creative and analytical use cases where the user needs the AI to generate rather than retrieve. The interface can feel constrained for open-ended exploration.
Design principles for AI interaction (as of February 2026)
Based on everything I’ve observed, used, and built, here are the design principles I think are most important for AI interaction right now. These will evolve as the field matures.
1. Match the interface to the task, not the technology
Don’t default to conversation because you’re building with a language model. Ask: what’s the most efficient interface for this specific task? Sometimes it’s conversation. Sometimes it’s a form. Sometimes it’s direct manipulation. Sometimes it’s a button that says “Summarize” and doesn’t require the user to type anything.
2. Make the AI’s confidence visible without making the user’s life harder
Users need to know when to trust the output and when to verify. But they won’t read confidence scores for every paragraph. Find subtle, contextual ways to signal uncertainty. Hedging language. Source links. Visual cues. The goal is calibrated trust, not documented uncertainty.
3. Design for the error, not just the happy path
AI products ship with demos that show the best-case output. Users experience the full distribution of outputs, including the bad ones. The product’s quality is determined not by how well the AI performs when it’s right but by how gracefully the product handles it when the AI is wrong. Undo, version history, suggestion mode, easy verification, clear error recovery.
4. Progressive disclosure of capabilities, not features
Don’t show users everything the AI can do. Show them the next thing they need. Let capabilities emerge through use rather than through documentation or onboarding tours. The AI that surfaces the right capability at the right moment is more powerful than the AI that lists all capabilities upfront.
5. Preserve user agency
As AI systems become more autonomous, the temptation is to remove the user from the loop. Resist this. Users need to feel in control, even when they’re delegating. Show what the AI is about to do before it does it. Let users intervene at any point. Make it clear that the human is the decision-maker and the AI is the tool, even when the tool is very capable.
6. Design for the relationship, not the transaction
Unlike traditional software (where each interaction is independent), AI interactions build a relationship. Users develop expectations, calibrate trust, and learn the system’s strengths and limitations over time. Design for this cumulative experience. Consistency of behavior matters more than any single impressive output.
What we haven’t figured out yet
I want to be honest about the gaps. There are fundamental interaction design problems in AI that nobody has solved convincingly:
Multi-modal coherence. As AI systems combine text, image, audio, and video generation, how do we design interfaces that handle multiple modalities without becoming overwhelming? Current approaches are mostly “separate tabs for separate modalities,” which is functional but inelegant.
Long-term context. How do we design for AI systems that remember previous interactions across sessions? Memory creates powerful personalization but also raises privacy concerns and creates the expectation of coherent long-term behavior that current systems can’t reliably deliver.
Multi-agent coordination. As agentic AI systems become more common, users will need to coordinate multiple AI agents working on different aspects of a task. The interaction design for this barely exists. How do you monitor, direct, and intervene in a team of AI agents?
Appropriate anthropomorphism. How human should AI interfaces feel? Too robotic and users don’t engage. Too human and users develop misplaced trust and emotional attachment. The right level of anthropomorphism likely depends on the use case, the user population, and the stakes of the task. We don’t have good frameworks for making this decision yet.
Accessibility. Most AI interaction research focuses on text-based conversational interfaces. How do screen readers handle streaming AI responses? How do users with motor impairments interact with suggestion-accept patterns? How do users with cognitive differences calibrate trust in AI output? These questions are critically important and significantly under-researched.
The next chapter
We’re in the early innings of AI interaction design. The patterns we’re establishing now will shape how billions of people interact with AI systems for the next decade. Some of what we’re doing will look prescient in retrospect. Much of it will look quaint.
The one thing I’m confident about: the winning AI products won’t be the ones with the best models. They’ll be the ones with the best interaction design. The model is the engine. The interaction design is how users experience that engine. And as models converge in capability (which they are, rapidly), the differentiation will come from the interface, the experience, and the trust that the design builds over time.
The technical challenge of building AI is enormous. The design challenge of making AI usable, trustworthy, and genuinely useful to real humans in their actual lives is at least as hard. And we’re just getting started.