Apr 7, 2025
This guide summarizes what I've learned in the past year working on early AI experiments at GitHub.
Just as designing for any technical products, understanding how things work under the hood matters. This is for designers new to the space wanting to learn a bit more about the challenges and constraints. It's not a deep dive, but hopefully a solid starting point.
Think of an AI system as a little team responsibile for output quality: the model, the contexts, and the prompts. Improving one usually helps the others. For instance, good prompts often involve context specification, and effective context-gathering requires thoughtful prompt design.
Most model providers offer limited control over core model behaviors, so the biggest leverage for end users building with models lies in context and prompt. That's where design plays a huge role—both on the backend API structures and the user-facing interfaces.
Large Language Models compress massive amounts of internet data into two main parts[1]:
Once trained, the model predicts the next word based on probabilities shaped by context. But its inner workings are very tough to decipher. It's essentially a black box2 with no clear dials or knobs. This unpredictability is a major design constraint, forcing us to account for multiple scenarios rather than a single path.
Professional tasks demand loads of context requirements. LLMs have huge amounts of knowledge but have limited ways to handle context dynamically. They have a context window that caps how much text you can include in each request, plus limited long-term memory. People often solve this with bigger context windows, embeddings3, and more precise contexts selections.
Collecting context is harder than it sounds. Tasks we take for granted—like opening a file or searching the web—are non-trivial for LLMs because they weren't trained to see or act. That's why many systems rely on "helpers"—tools, MCP servers, orchestrators, etc. to handle these basic functions. Andrej Karpathy compared the future of LLM systems to a kernel in an emerging OS, using memory and computational tools to solve problems.
Even with structured tools, LLMs can skip steps, hallucinate, or make things up as they go. It's a challenge for both engineers and designers. This means designers must push upstream to define the most effective API structure for LLMs, especially if the underlying system is optimized for static UIs. Are we feeding data to the model efficiently? Which tools do we provide for LLMs? How do we help the model understand when and how to use them?
Steering model behavior via prompts is actually much simpler than what you might think4. Just be specific, define roles, clarify objectives, and provide examples. Like you're talking to a human with zero context. The style of prompting varies: creative tasks might mean avoiding rigid instructions, while more consistent outputs might require a surplus of examples5,6 .
Each model behaves differently. One might fixate on tools over context, another might be great at brainstorming but miss empathy. I like to think of models as quirky child geniuses, and their personalities help me understand their quirks. The best way to improve is to experiment and keep trying.
Prompt-related challenges boil down to two things: increasing the consistency of model behaviors while working with a ton of variables (e.g. context, tools, etc.) and crafting an input-output flow that's easy to use and understand for your users.
Designing with LLMs introduces a unique set of challenges that shift how we think about systems, interfaces, and data:
Create higher value abstractions. The inherent unpredictability of LLMs drives product design toward domain-specific, evaluation-based setups with more predictable behaviors.
Data design matters as much as UI design. Integrate with existing APIs and figure out how to handle long-term, "soaking up" knowledge. It's not just about pretty interfaces, it's also about how data flows and persists.
Manifest the input, reasoning, to output process that provides predictability, transparency, and efficiency. This could mean carefully weaving AI into existing tools in subtle ways—or radically overhauling how we request and process information at its core.
***