Back to Blog

Why AI Agents Break on Long Tasks - And the Fix That Changes Everything

Understanding unbounded context growth and the state externalization architecture that solves it

ai-agents architecture context-management llm

Based on InfiAgent research (January 2026) and insights from building production voice agent systems


The Problem No One Talks About

Anyone who’s built an AI agent knows the story: the system works great on short tasks, but the moment you give it something longer - it starts “forgetting,” gets confused, and produces weird results.

This isn’t a bug. It’s an architectural problem called Unbounded Context Growth.

What Actually Happens

When an agent runs a task, it accumulates history:

  • Every action it performed
  • Every response from tools
  • Every intermediate decision
  • Every error and correction

The LLM’s context window fills up. Then two bad things happen:

1. Error Accumulation - Small mistakes compound. An agent that forgot a small detail at step 3 makes a wrong decision at step 10, which causes failure at step 20.

2. Memory Loss - The LLM loses important information from the beginning of the conversation because it gets pushed out by newer information.

Steps 1-10: Works great Steps 11-30: Starts getting confused Steps 31+: Accumulated errors, irrelevant results

Why This Is an “Architectural Problem”

When I say “architectural problem that requires an architectural solution” - I mean this isn’t something you’ll fix with a better prompt or a stronger model.

It’s like traffic jams: you won’t solve them with a faster car. You need to change the roads.

Similarly, unbounded context growth is a problem with how the system is built. Even GPT-5 with a million-token context window will face the same issue - just later.

Common Solutions - And Why They Don’t Work

Context Compression - Compress history into a shorter summary. The problem: You lose critical information. What seems unimportant at step 5 might be critical at step 50.

RAG (Retrieval-Augmented Generation) - Store everything in a vector DB and retrieve by relevance. The problem: The agent doesn’t always know what’s relevant. Semantic search doesn’t capture logical connections.

Sliding Window - Keep only the last N actions. The problem: You lose the broader context. The agent forgets why it’s doing what it’s doing.

The New Approach: State Externalization

New research called InfiAgent (January 2026, Hong Kong Polytechnic University) proposes a completely different approach:

Don’t try to maintain a long context - externalize the state.

Instead of cramming everything into the prompt, the agent maintains a workspace of files representing the task state.

How Does the Context Stay Fixed?

The key trick: Every step, the context is rebuilt from scratch.

Step 47: +------------------------------------+ | System prompt (fixed) | | Workspace snapshot (current state) | | Steps 42-46 only (recent window) | +------------------------------------+

Step 48: +------------------------------------+ | System prompt (same) | | Workspace snapshot (updated) | | Steps 43-47 only (window shifts) | +------------------------------------+

The full history? Saved in files or DB - but doesn’t enter the prompt.

The agent “remembers” through the workspace snapshot, not through conversation history.

Traditional Agent: +-------------------------------------+ | System Prompt | | + Step 1 result | | + Step 2 result | | + Step 3 result | | + … (grows forever) | | + Step N result | Context explodes +-------------------------------------+

InfiAgent Architecture: +-------------------------------------+ | System Prompt | | + Workspace Snapshot (current) | | + Last 5 actions only | Fixed size always +-------------------------------------+ | +-------------------------------------+ | External Workspace (files/DB) | | - Full history | | - Intermediate results | | - Decision logs | +-------------------------------------+

External Attention Pipeline - Sub-Agents for Reading

InfiAgent adds a component called External Attention Pipeline.

If you’re familiar with Claude Code - it’s exactly the same principle.

When Claude Code needs to read a large file, it doesn’t load everything into context. It uses a sub-agent that reads the file and returns only what’s relevant to the question.

InfiAgent works the same way: when the agent needs to read an 80-page PDF, it doesn’t load it into its context. Instead:

  1. Calls a dedicated tool (answer_from_pdf)
  2. The tool runs a separate LLM just on that document
  3. Returns only the relevant answer

+-------------------------------------+ | Main Agent | | “What’s the main conclusion of X?” | +------------------+------------------+ | v +-------------------------------------+ | Sub-Agent (answer_from_pdf) | | Reads the entire PDF | | Returns: “The conclusion is…” | +------------------+------------------+ | v +-------------------------------------+ | Main Agent | | Gets only the short answer | | Continues to next step | +-------------------------------------+

The main agent stays “light” - it doesn’t carry the weight of every document it read.

The Proof: 80 Academic Papers

InfiAgent was tested on a real task: read 80 academic papers and write a comprehensive literature review.

Results:

  • Regular agents crashed after ~30 papers
  • InfiAgent finished all 80 - maintaining consistent quality

And here’s the impressive part: this works without task-specific fine-tuning. The same architecture works on any domain.

A 20B parameter model with InfiAgent competed with much larger proprietary models - just because of the architecture.

Practical Implementation: What to Actually Do

The Principle

Instead of accumulating all history in the prompt, each step you rebuild the context from just two things:

  1. Snapshot of current state - what the agent knows now, what it already decided, what’s left to do
  2. Small window of recent actions - say 5 steps back, no more

The full history is saved on the side (DB or files) - but doesn’t enter the prompt.

Four Principles:

1. Separate state from history - Current state is small and focused: “what stage am I at, what did I decide, what’s left.” The full history of how I got there - saved separately.

2. Structure over text - JSON with clear fields beats free text. Easier to read, easier to update, easier to query.

3. Save the “why” - Not just what the agent decided, but why. This helps it continue in the right direction even without remembering the full history.

4. Sub-agents for reading - Like Claude Code, when you need to process a large document - let a separate agent read it and return only what’s relevant. The main agent stays light.

When Is This Relevant?

Use state externalization when:

  • Long tasks (dozens of steps and up)
  • Multi-agent systems passing information between them
  • Processes that need to maintain consistency over time
  • Agents processing many documents

Not necessary when:

  • Short, focused conversations
  • Simple one-off tasks
  • Context window is big enough for the task

Bottom Line

The unbounded context problem isn’t a bug you can fix with a better prompt or stronger model. It’s how the system is built. Just like you won’t solve traffic jams with a faster car - you need to change the roads.

InfiAgent proved this with 80 papers. Now the question is just how to implement it in your system.


Further Reading: