Context Engineering - The Art of Building Agents That Actually Work

Most agent failures are not model failures anymore - they are context failures.

- Manus Team

The Core Insight

Context is like working memory - it has limits

Just like we can’t hold 100 things in our head at once, LLMs struggle with overloaded context. Every token you add consumes “attention budget”.

The goal: Find the smallest set of high-signal tokens that maximize the likelihood of the desired outcome.

”Minimal” doesn’t mean short. It means: only what’s needed, but everything that’s needed.

How Attention Works

Understanding attention explains why context engineering matters so much.

What is Attention?

Transformers (the architecture behind Claude, GPT, etc.) use a mechanism called attention. Each output token “looks at” all input tokens and decides how much weight to give each one.

The U-Shaped Attention Curve

Attention isn’t uniform across context. There’s a known pattern:

Attention strength across context:

HIGH ████████░░░░░░░░░░░░████████ HIGH
   ^ Beginning          End ^

LOW  ░░░░░░░░████████████░░░░░░░░
            ^ Middle ^

Beginning (system prompt) - high attention

Middle (old history) - low attention (“lost in the middle”)

End (recent messages) - highest attention

What This Means For You

1. System prompt gets read - it’s at the start, gets attention

2. Middle info gets “buried” - long history = important info gets lost

3. Recent messages dominate - that’s why Goal Repetition works!

4. More tokens = less attention per token - that’s the “attention budget”

Goal Repetition works because you push the goal to the end of context, where attention is strongest.

The Four Problems

PROBLEM 01

Context Rot

Model accuracy degrades as context grows, even within supported limits.

PROBLEM 02

Goal Drift

After many tool calls, the model loses sight of the original goal.

PROBLEM 03

Context Confusion

Too many tools create ambiguous decision points; the model struggles to choose.

PROBLEM 04

Lost in the Middle

Information buried in the middle of context gets ignored or forgotten.

The Five Solutions

Separate Storage from Presentation

What you store persistently differs from what the model sees each call.

Just-In-Time Context

Load information when needed, not upfront. Keep lightweight references, fetch on demand.

Compaction

Summarize instead of delete. Preserve decisions and errors, drop verbose outputs.

Goal Repetition

Repeat goals at the end of context to keep them in the model’s attention span.

Hierarchical Tool Exposure

Organize tools into tiers. Core tools always available; specialized ones load by context.

The Patterns

How to implement each solution, with code examples:

Pattern 1 Separate Storage from Presentation

What you store persistently differs from what the model sees each call. This separation allows you to evolve storage and prompts independently.

Example

Memory tools save files to storage/. The model doesn’t load all files - it calls memory_view when it needs specific information.

// Storage (durable) - everything saved
const storage = {
  fullHistory: [...],      // all messages ever
  allToolResults: [...],   // every tool output
  userPreferences: {...}
}

// Presentation (per-call) - curated view
const context = {
  recentHistory: summarize(storage.fullHistory),
  relevantResults: filterByTask(storage.allToolResults),
  activeGoal: storage.currentTask
}

Pattern 2 Just-In-Time Context

Don’t load everything upfront “just in case”. Keep lightweight identifiers and fetch content when actually needed.

Bad: Upfront Loading

system_prompt = f"""
Here is the entire codebase:
{read_all_files()}
"""
# 100K tokens wasted

Good: On-Demand Loading

system_prompt = """
Use Glob to find files.
Use Read to view them.
"""
# Files loaded only when needed

Example: Claude Code

CLAUDE.md loads at startup (~500 tokens). But the codebase? Loaded via Glob and Read only when the task requires specific files.

Pattern 3 Compaction Strategies

When context grows too large, shrink it strategically while preserving essential information.

Tool Result Clearing: Replace verbose outputs with summaries. “File read and analyzed” instead of 5000 tokens of code.
History Summarization: Compress old turns. Keep the last 3 raw to preserve “rhythm”.
Sub-agent Isolation: Hand off to a sub-agent with clean context, get back a 1-2K token summary.

// Instead of just deleting:
messages.shift()

// Summarize before removing:
const oldMessages = messages.splice(0, 10)
const summary = await summarize(oldMessages)
messages.unshift({ role: 'system', content: summary })

Pattern 4 Goal Repetition

After 50 tool calls, the model forgets the original goal. Fight this by constantly repeating goals at the end of context.

How Manus Does It

They rewrite the todo list at the end of each turn. This pushes the overall plan into the “recent attention span”, preventing “lost in the middle” issues.

// At session start:
memory_view("current_goals.md")

// After completing each task:
memory_str_replace(
  "current_goals.md",
  oldGoals,
  updatedGoals  // with completed items marked
)

Pattern 5 Hierarchical Tool Exposure

100+ tools = context confusion. The model can’t decide what to use. Solution: organize tools into tiers and load them by context.

// Tier 1: Core (always available) ~10 tools
const coreTools = [
  "Read", "Write", "Bash", "Grep", "WebSearch",
  "memory_view", "memory_create", "Task", "Skill"
]

// Tier 2: Domain-specific (load when detected)
const googleTools = [  // Load when "email", "calendar" detected
  "list_gmail", "send_gmail", "list_calendar", ...
]

// Dynamic loading:
if (userMessage.includes("gmail") || userMessage.includes("email")) {
  activeTools = [...coreTools, ...googleTools]
}

This reduces the tool decision space from 30 to ~10-15, making the model more decisive.

Pattern 6 Tool Design Principles

Well-designed tools reduce context load and decision complexity.

Self-contained: No hidden state dependencies. Each tool works independently.
Token-efficient returns: Return the minimum required. Not the whole file, just the relevant part.
Clear descriptions: If a human can’t decide which tool to use, neither can the model.
Non-overlapping: Each tool has a unique purpose. No two tools doing the same thing.

Good Example

archive_gmail vs trash_gmail - two separate tools with clear, non-overlapping purposes.

Bad Example (hypothetical)

delete_email, remove_email, trash_email - three tools doing the same thing. The model will get confused.

Common Mistakes

Context Pollution

Loading everything “just in case” - floods the model with irrelevant information.

Context Starvation

Being too aggressive with trimming - the model lacks info to make good decisions.

Goal Drift

Not repeating goals - the model wanders from the original task after many turns.

Tool Explosion

Adding a tool for every function - creates decision paralysis and context confusion.

Sources: