Pillar

AI Agent Architecture

From single-prompt interactions to multi-agent orchestration systems. How to design AI agent architectures that handle real complexity - with patterns, trade-offs, and lessons from building SPINE across 230+ projects.

By Fredrik Brattén

The Problem with Single Agents

Most AI agent tutorials show a single agent with a few tools. That works for demos. In production, you hit walls fast: context windows overflow, tools conflict, costs spiral, and there is no way to track what happened or why.

The fundamental issue is not the agent - it is the architecture around the agent. How do you classify tasks before committing resources? How do you assemble the right context without overwhelming the model? How do you verify outputs before they reach users? How do you maintain continuity across sessions that span days?

These are systems design problems, not prompt engineering problems. They require the same kind of layered thinking that makes distributed systems reliable: clear boundaries, composable components, graceful failure modes, and observable state at every level.

A Six-Layer Architecture

SPINE implements six layers that together form a complete agent orchestration system. Each layer has a single responsibility and can be used independently.

Task Classification

Before an agent acts, classify the work. A simple lookup should not trigger the same machinery as a multi-source research task. SPINE uses three tiers: Tier 1 (direct execution), Tier 2 (MCP-assisted), Tier 3 (full orchestration with subagents).

Match complexity of tooling to complexity of task.

Context Assembly

Agents need the right information at the right time. Context stacks compose project config, session memory, discovered resources, and scenario-specific instructions into a single coherent prompt. Without structured context, agents hallucinate or repeat work.

Context is not just data - it is the operating system for agent behavior.

Tool Discovery

MCP (Model Context Protocol) lets agents discover and call tools at runtime. Instead of hardcoding API calls, agents query available servers, inspect tool schemas, and choose the right tool for the task. 31 servers in the current ecosystem, each exposing 5-40 specialized tools.

Agents should find tools, not be told about them.

Execution & Routing

The orchestrator routes tasks to the right executor: placeholder (dry-run), subagent (Claude Code), or content pipeline (fal.ai, Gemini). Each executor handles its own retry logic, timeout management, and artifact collection.

Separate the decision of what to do from how to do it.

Memory & Continuity

Sessions end, but knowledge should persist. Ephemeral session memory with configurable decay curves surfaces recent findings without permanent storage overhead. Cross-session handover documents preserve critical context across conversation boundaries.

Good memory is not remembering everything - it is surfacing the right thing at the right time.

Review & Verification

Every output gets reviewed. The content pipeline uses five dimensions: entity consistency, prompt fidelity, technical quality, style coherence, and dialectic critique. Multimodal LLM review (Gemini) adds visual verification for generated media.

Generation without review is just noise.

Patterns That Work

Recurring architectural patterns distilled from building and operating multi-agent systems daily.

Tiered Execution

Not every task needs the full agent stack. Classify tasks by complexity and route to the simplest executor that can handle them.

Why it matters: Prevents over-engineering simple tasks while ensuring complex tasks get proper orchestration.

Context Stacks

Layer project config, session state, discovered resources, and scenario instructions into composable context objects.

Why it matters: Gives agents exactly the information they need without overwhelming their context window.

Scenario-Driven Workflows

Define reusable workflow templates (YAML) that specify which tools to use, what context to load, and what verification to run.

Why it matters: Makes complex multi-step workflows reproducible and auditable.

Progressive Review

For multi-output workflows (e.g., video clips), each subsequent output is reviewed against the full sequence so far.

Why it matters: Maintains narrative coherence across a series of generated outputs.

Graceful Degradation

When a tool or service is unavailable, the system records it as a blindspot rather than failing. Missing entity memory becomes an entity_consistency gap, not a crash.

Why it matters: Production systems must handle partial failures without stopping entirely.

Session Handover

Before context runs low, agents generate structured handover documents with accomplishments, pending tasks, key decisions, and Minna memory pointers.

Why it matters: Enables multi-session projects where no single conversation can hold all context.

MCP as the Foundation

The Model Context Protocol (MCP) is what makes this architecture practical. Instead of building monolithic agent applications, each capability lives in its own MCP server with its own dependencies, its own tests, and its own lifecycle.

An agent that needs to search code calls Intelligence Engine. An agent that needs to store a finding calls Minna Memory. An agent that needs to generate video calls the Content Pipeline. None of these tools know about each other - they communicate through the orchestrator via standardized JSON-RPC.

This is not theoretical. The 31 MCP servers in the current ecosystem handle everything from research workflows to music production to project auditing. Each server was built because a real task needed it, not because an architecture diagram said it should exist.

Honest Trade-offs

Multi-agent systems are not always the right answer. A single well-prompted agent with a few tools will outperform a complex orchestration system for straightforward tasks. The overhead of task classification, context assembly, and review only pays off when tasks are genuinely complex or need to span multiple sessions.

Latency is real. Each MCP tool call adds network and process overhead. Stdio transport (subprocess per call) adds roughly 20 seconds per tool invocation. For interactive use cases, this means careful attention to which calls are necessary and which can be batched or cached.

Observability requires investment. Without logging at every layer, debugging a failed multi-agent workflow is like debugging a distributed system with no traces. Build observability from the start, not as an afterthought.

See It In Action

These prototypes demonstrate context engineering principles in production systems.

Prototype 01