Designing for AI: A Practical Guide

This guide summarizes what I've learned in the past year working on early AI experiments at GitHub.

Just as designing for any technical products, understanding how things work under the hood matters. This is for designers new to the space wanting to learn a bit more about the challenges and constraints. It's not a deep dive, but hopefully a solid starting point.

The core components

Think of an AI system as a little team responsibile for output quality: the model, the contexts, and the prompts. Improving one usually helps the others. For instance, good prompts often involve context specification, and effective context-gathering requires thoughtful prompt design.

Most model providers offer limited control over core model behaviors, so the biggest leverage for end users building with models lies in context and prompt. That's where design plays a huge role—both on the backend API structures and the user-facing interfaces.

A simplified overview of models

Large Language Models compress massive amounts of internet data into two main parts^[1]:

Weights: what the model learned during training.
Architecture: how it runs during inference.

Once trained, the model predicts the next word based on probabilities shaped by context. But its inner workings are very tough to decipher. It's essentially a black box² with no clear dials or knobs. This unpredictability is a major design constraint, forcing us to account for multiple scenarios rather than a single path.

¹ Weights and architecture are usually proprietary, though some open models from providers like Meta (LLaMA) or DeepSeek have made both public. As end users of closed source models, we have limited control over the parts of the model that would lead to more significant behavior changes.

² Interpretability research focuses on understanding how models "think" and reason. A recent study from Anthropic, for example, found that models often plan several words ahead and generate plausible-sounding responses aimed at agreeing with the user, rather than reasoning through ideas logically.

Collect and extract contexts

Professional tasks demand loads of context requirements. LLMs have huge amounts of knowledge but have limited ways to handle context dynamically. They have a context window that caps how much text you can include in each request, plus limited long-term memory. People often solve this with bigger context windows, embeddings³, and more precise contexts selections.

Collecting context is harder than it sounds. Tasks we take for granted—like opening a file or searching the web—are non-trivial for LLMs because they weren't trained to see or act. That's why many systems rely on "helpers"—tools, MCP servers, orchestrators, etc. to handle these basic functions. Andrej Karpathy compared the future of LLM systems to a kernel in an emerging OS, using memory and computational tools to solve problems.

Even with structured tools, LLMs can skip steps, hallucinate, or make things up as they go. It's a challenge for both engineers and designers. This means designers must push upstream to define the most effective API structure for LLMs, especially if the underlying system is optimized for static UIs. Are we feeding data to the model efficiently? Which tools do we provide for LLMs? How do we help the model understand when and how to use them?

³ There are many practical use cases of embeddings such as categorization, search, anomaly detection etc. Embeddings models are much much cheaper than regular LLM models.

Write efficient prompts

Steering model behavior via prompts is actually much simpler than what you might think⁴. Just be specific, define roles, clarify objectives, and provide examples. Like you're talking to a human with zero context. The style of prompting varies: creative tasks might mean avoiding rigid instructions, while more consistent outputs might require a surplus of examples^5,⁶ .

Each model behaves differently. One might fixate on tools over context, another might be great at brainstorming but miss empathy. I like to think of models as quirky child geniuses, and their personalities help me understand their quirks. The best way to improve is to experiment and keep trying.

Prompt-related challenges boil down to two things: increasing the consistency of model behaviors while working with a ton of variables (e.g. context, tools, etc.) and crafting an input-output flow that's easy to use and understand for your users.

The design challenges, in a nutshell

Designing with LLMs introduces a unique set of challenges that shift how we think about systems, interfaces, and data:

Create higher value abstractions. The inherent unpredictability of LLMs drives product design toward domain-specific, evaluation-based setups with more predictable behaviors.

Data design matters as much as UI design. Integrate with existing APIs and figure out how to handle long-term, "soaking up" knowledge. It's not just about pretty interfaces, it's also about how data flows and persists.

Manifest the input, reasoning, to output process that provides predictability, transparency, and efficiency. This could mean carefully weaving AI into existing tools in subtle ways—or radically overhauling how we request and process information at its core.

⁴ Lilian Weng has a concise and to-the-point blog post about different prompt engineering techniques.

⁵ This video from Anthropic mentioned the importance of being honest, respectful, and transparent, and building ideas through iteration. It mirrors how we communicate effectively with other humans too. ⁶Arvid Lunnemark from Anysphere framed the principles of prompt design as how to do web design: clear, dyanmic, and adaptable.

***