Agents Under the Hood

You may have seen demos where a system searches documentation, writes code, runs it, hits an error, fixes the error, and produces a final result without any human touching the keyboard. There is no new intelligence involved. The model is the same type of language model you already use.

What changed is not the brain, but the scaffolding around it.

The Core Analogy

A plain LLM is like a brilliant expert locked in a room. You slide a question under the door and receive an answer back. The expert is smart, but isolated. They cannot check facts, run code, or see the results of their actions.

An agent is what happens when you give that expert tools and the ability to take multiple turns. The same expert can now say, “Let me look that up,” actually perform the search, inspect the results, write code, run it, observe errors, and fix them.

Same model. Completely different capability.

The Agent Loop

At its core, an agent is simply an LLM in a loop with tools. Researchers often describe this pattern as perceive, reason, act, and observe.

This loop repeats until the model decides the task is complete. Simple queries may finish in one cycle. Complex tasks may require many iterations.

Tool Calling Mechanics

Language models do not execute tools. They only produce text. Tool use works because your code defines tools with names, descriptions, and required inputs. These definitions are included in the system prompt.

When the model decides to use a tool, it outputs structured JSON instead of plain text. Your orchestration layer parses this output, executes the real API call, and feeds the results back to the model as an observation.

The model proposes. Your code executes. The model observes the result. That handshake is the foundation of agent systems.

Each loop iteration is an API call. More loops mean more latency and more cost. This is an engineering constraint, not an implementation detail.

Memory and Context

Language models have a limited context window. Think of it as desk space. The current conversation, tool results, and instructions all compete for that space.

Working memory is managed by summarising, dropping irrelevant details, or keeping only the most recent interactions. Long-term memory is handled externally.

Important information is extracted, converted into embeddings, and stored in a vector database. When needed, relevant memories are retrieved by semantic similarity and placed back into context.

This hybrid approach consistently outperforms attempting to stuff everything into the context window. It is how production-grade agents actually work.

Where Agents Fail

Agent systems fail in predictable ways:

These are not reasons to avoid agents. They are engineering problems with engineering solutions. The key is to design for these failure modes deliberately.

Key Takeaways

Agents are not smarter models. They are the same models placed in a loop. Tools are structured requests. Your code executes actions. Memory is bounded and must be managed explicitly.

If you want to truly understand agents, build a minimal one from scratch. One loop. One tool. Under 100 lines of code. That exercise teaches more than any framework tutorial.