Most agent failures are not model failures anymore - they are context failures.
- Manus Team
The Core Insight
Context is like working memory - it has limits
Just like we can’t hold 100 things in our head at once, LLMs struggle with overloaded context. Every token you add consumes “attention budget”.
The goal: Find the smallest set of high-signal tokens that maximize the likelihood of the desired outcome.
”Minimal” doesn’t mean short. It means: only what’s needed, but everything that’s needed.
How Attention Works
Understanding attention explains why context engineering matters so much.
What is Attention?
Transformers (the architecture behind Claude, GPT, etc.) use a mechanism called attention. Each output token “looks at” all input tokens and decides how much weight to give each one.
The U-Shaped Attention Curve
Attention isn’t uniform across context. There’s a known pattern:
Attention strength across context:
HIGH ████████░░░░░░░░░░░░████████ HIGH
^ Beginning End ^
LOW ░░░░░░░░████████████░░░░░░░░
^ Middle ^Beginning (system prompt) - high attention
Middle (old history) - low attention (“lost in the middle”)
End (recent messages) - highest attention
What This Means For You
1. System prompt gets read - it’s at the start, gets attention
2. Middle info gets “buried” - long history = important info gets lost
3. Recent messages dominate - that’s why Goal Repetition works!
4. More tokens = less attention per token - that’s the “attention budget”
Goal Repetition works because you push the goal to the end of context, where attention is strongest.
The Four Problems
PROBLEM 01
Context Rot
Model accuracy degrades as context grows, even within supported limits.
PROBLEM 02
Goal Drift
After many tool calls, the model loses sight of the original goal.
PROBLEM 03
Context Confusion
Too many tools create ambiguous decision points; the model struggles to choose.
PROBLEM 04
Lost in the Middle
Information buried in the middle of context gets ignored or forgotten.
The Five Solutions
Separate Storage from Presentation
What you store persistently differs from what the model sees each call.
Just-In-Time Context
Load information when needed, not upfront. Keep lightweight references, fetch on demand.
Compaction
Summarize instead of delete. Preserve decisions and errors, drop verbose outputs.
Goal Repetition
Repeat goals at the end of context to keep them in the model’s attention span.
Hierarchical Tool Exposure
Organize tools into tiers. Core tools always available; specialized ones load by context.
The Patterns
How to implement each solution, with code examples:
Pattern 1 Separate Storage from Presentation
What you store persistently differs from what the model sees each call. This separation allows you to evolve storage and prompts independently.
Example
Memory tools save files to storage/. The model doesn’t load all files - it calls memory_view when it needs specific information.
// Storage (durable) - everything saved
const storage = {
fullHistory: [...], // all messages ever
allToolResults: [...], // every tool output
userPreferences: {...}
}
// Presentation (per-call) - curated view
const context = {
recentHistory: summarize(storage.fullHistory),
relevantResults: filterByTask(storage.allToolResults),
activeGoal: storage.currentTask
}Pattern 2 Just-In-Time Context
Don’t load everything upfront “just in case”. Keep lightweight identifiers and fetch content when actually needed.
system_prompt = f"""
Here is the entire codebase:
{read_all_files()}
"""
# 100K tokens wastedsystem_prompt = """
Use Glob to find files.
Use Read to view them.
"""
# Files loaded only when neededExample: Claude Code
CLAUDE.md loads at startup (~500 tokens). But the codebase? Loaded via Glob and Read only when the task requires specific files.
Pattern 3 Compaction Strategies
When context grows too large, shrink it strategically while preserving essential information.
- Tool Result Clearing: Replace verbose outputs with summaries. “File read and analyzed” instead of 5000 tokens of code.
- History Summarization: Compress old turns. Keep the last 3 raw to preserve “rhythm”.
- Sub-agent Isolation: Hand off to a sub-agent with clean context, get back a 1-2K token summary.
// Instead of just deleting:
messages.shift()
// Summarize before removing:
const oldMessages = messages.splice(0, 10)
const summary = await summarize(oldMessages)
messages.unshift({ role: 'system', content: summary })Pattern 4 Goal Repetition
After 50 tool calls, the model forgets the original goal. Fight this by constantly repeating goals at the end of context.
How Manus Does It
They rewrite the todo list at the end of each turn. This pushes the overall plan into the “recent attention span”, preventing “lost in the middle” issues.
// At session start:
memory_view("current_goals.md")
// After completing each task:
memory_str_replace(
"current_goals.md",
oldGoals,
updatedGoals // with completed items marked
)Pattern 5 Hierarchical Tool Exposure
100+ tools = context confusion. The model can’t decide what to use. Solution: organize tools into tiers and load them by context.
// Tier 1: Core (always available) ~10 tools
const coreTools = [
"Read", "Write", "Bash", "Grep", "WebSearch",
"memory_view", "memory_create", "Task", "Skill"
]
// Tier 2: Domain-specific (load when detected)
const googleTools = [ // Load when "email", "calendar" detected
"list_gmail", "send_gmail", "list_calendar", ...
]
// Dynamic loading:
if (userMessage.includes("gmail") || userMessage.includes("email")) {
activeTools = [...coreTools, ...googleTools]
}This reduces the tool decision space from 30 to ~10-15, making the model more decisive.
Pattern 6 Tool Design Principles
Well-designed tools reduce context load and decision complexity.
- Self-contained: No hidden state dependencies. Each tool works independently.
- Token-efficient returns: Return the minimum required. Not the whole file, just the relevant part.
- Clear descriptions: If a human can’t decide which tool to use, neither can the model.
- Non-overlapping: Each tool has a unique purpose. No two tools doing the same thing.
Good Example
archive_gmail vs trash_gmail - two separate tools with clear, non-overlapping purposes.
Bad Example (hypothetical)
delete_email, remove_email, trash_email - three tools doing the same thing. The model will get confused.
Common Mistakes
Context Pollution
Loading everything “just in case” - floods the model with irrelevant information.
Context Starvation
Being too aggressive with trimming - the model lacks info to make good decisions.
Goal Drift
Not repeating goals - the model wanders from the original task after many turns.
Tool Explosion
Adding a tool for every function - creates decision paralysis and context confusion.
Sources: