Agent loops: a source audit and definition map · Resource

The companion guide teaches one clean picture of an agent loop — reason, act, observe, around and around until the goal is met. This document shows what sits underneath it: that clean picture is my editorial choice, and the field itself does not actually agree on what an agent loop is. I read what each source genuinely means — provider documentation, libraries, research papers, blogs, practitioner posts — and here is the map.

The short answer: there is no consensus

If you have read about agents across a few sources and came away feeling that each one was saying something different — you were right. The set I assembled holds 45 verified sources, and across them there is no single agreed definition of an agent loop.

One sentence did cut through, and it reads cleanly: "a model that uses tools in a loop to reach a goal." Three big names landed under that formula — Anthropic, LangChain, and Simon Willison, the most-cited blogger on the topic. That is the tidy convergence the guide teaches too. But take one step sideways and the definitions split into a dozen distinct camps that disagree about who runs the loop, what counts as one lap, whether checking the result is part of it, and even whether "loop" is the right word at all.

I built the audit as a maker–checker machine: one agent found a source and copied out the exact sentence in which it defines the loop, while a separate, independent agent re-fetched the same address and ruled on whether the definition was actually on the page. Of 45 sources, 43 passed that confirmation; two were flagged as partial, not hidden. Nothing here is unverified and then presented as certain.

The spectrum: from minimum to maximum

The definitions arrange themselves on a single axis — from the most modest to the most elaborate.

At the minimal end sit the near-identical one-liners. "An LLM agent runs tools in a loop to achieve a goal" (Willison), "a model calling tools in a loop" (LangChain), "LLMs autonomously using tools in a loop" (Anthropic). The loop here is trivial; all the craft lives in the tools, the prompt, and the stop condition.

One step further is a named inner cycle: reason → act → observe → repeat. This is the ReAct camp — the Yao et al. paper, smolagents, LlamaIndex, Pydantic, the Bedrock documentation. Every lap here has three explicit moves, and reasoning is interleaved with acting rather than done once at the start.

Further still, checking arrives as its own named loop step — either the same agent does it ("gather context → act → verify the work → repeat," from the Claude Agent SDK), or a separate evaluator (the evaluator-optimizer pattern, planner-generator-evaluator) judges someone else's work until it clears a threshold.

At the maximal end the loop stops being a motion inside the agent's head and becomes a whole system: self-driving, run on a schedule, spread across many sessions. This is where practitioners' "loop engineering" lives — Steinberger ("you should be designing the loops that prompt your agents"), Osmani, Geoffrey Huntley's Ralph (the literal shell loop while :; do cat PROMPT.md | npx amp; done, described by HumanLayer), the cross-attempt memory loop from Reflexion, and Anthropic's harness for long-running agents that carry state out to git and progress files.

The main axes of disagreement

Underneath the one-sentence definitions lie dividing lines they never show. I gathered six.

Who runs the loop. Some say the model itself drives it (the model is the loop). Others say the loop is owned by a separate orchestrator object: a Runner with a max_turns cap and handoffs between agents (OpenAI, Google ADK, Bedrock). A third camp — frontline practitioners — holds that a human designs the loop, and the agent merely runs inside it.

What counts as one lap. A single model or tool call? A whole context window, or one session? One attempt with memory carried into the next? Each answer yields a different loop.

Whether checking is a loop step — and who checks. The agent itself? A separate evaluator? Or is there no explicit verification step at all, because the loop simply ends when the model stops calling tools?

What stops the loop. The model decides it is done. An evaluator's verdict. A hard lap cap. A success criterion the loop checks against itself. Or — and this is the answer typical of practitioners — nothing on the model's side: the loop runs unsupervised until an operator, a schedule, or Ctrl+C stops it.

Whether tools are constitutive. Half the sources say using tools is the point of the loop. The other half say the loop is defined by decision, reflection, learning, or orchestration, and that tools are optional.

Whether "loop" is even the right word. Some sources say "yes, an agent is fundamentally a loop." Some say "the loop is real, but too modest on its own — it is just one element of a larger system." And a separate terminological trap: at Zapier, in LangChain HITL, and in Microsoft HITL, "loop" means human oversight (human-in-the-loop) — approval gates, escalations — and defines no agent loop at all. That is a pure naming collision; worth catching, so you do not count the word "loop" as agreement where it is talking about something else.

Eleven definitional camps

Out of that disagreement, eleven distinct mental models of "the loop" emerge. Reading a few authors in a row, you will hit several of them and assume they are the same. They are not.

The minimalist "tools in a loop" — an agent is simply a model calling tools until the goal is met. This is the formal convergence of Willison and Anthropic. A curiosity: swyx quotes the sentence only to attack it as "too minimalist to be useful" — so the minimalist line is now what gets argued against.
The inner reason-act-observe cycle (the ReAct canon) — the only camp that names an explicit cognitive cycle inside the agent, and it is almost entirely formal. A nuance from the original ReAct paper: a "thought" is an action in language space that yields no observation — which is why the canonical order is reasoning first.
The self-verification loop — the defining step is that the agent checks its own output before moving on. Two sources mean different things by it, though: the Anthropic SDK frames it as a general design principle, while Voyager frames it as a narrow code-correctness gate inside a four-round loop, after which it saves a new skill.
The loop with a separate evaluator — a distinct agent grades the executor's output; the loop runs with feedback until it clears a threshold. The executor never grades itself.
The runtime-environment (Runner) loop — the loop is a primitive owned by a Runner/orchestrator object, not by the agent. One run is one application-level step, with a hard max_turns ceiling. The camp is entirely formal — no practitioner source thinks of the loop as a Runner object.
The agent as a decision loop (plan-act-learn) — the only source that makes learning a full loop step alongside planning and acting, and says outright that the loop is "not just tool use."
The autonomous goal loop — runs unsupervised: it prompts itself or wakes on a schedule, judges its state against a success criterion, records the result, and repeats with no further human prompting. The formal anchor here is AutoGPT.
Loop engineering (design the loop, not the prompt) — you stop prompting the agent turn by turn and start designing the loop system that prompts it. The human becomes the loop's architect; this is the frontline-practitioner stance, grounded in Steinberger's viral post.
The self-improvement / cross-attempt memory loop — the loop closes across many attempts: act → get feedback → reflect in words → write the reflection to episodic memory → try better next time. Improvement runs through language, not through retraining weights (Reflexion).
"Loop" is the wrong primitive — an agent is a set of components rather than a loop; the loop is just one piece (control flow) or an emergent behavior of a larger architecture. This is where swyx stands with his six-part IMPACT model, alongside an academic survey with a four-module architecture (profiling / memory / planning / action).
"Loop" means human oversight (the terminological trap) — pages that use the word "loop" to name human approval gates and define no agent loop at all. A pure naming collision, broken out separately so it is not mistaken for agreement.

The named convergence sources

The whole point of the audit is that the sources can be named — it is a review of public material, not a secret. The clean, formal convergence on "tools in a loop" is Simon Willison, LangChain, and Anthropic (in two phrasings: "autonomously using tools in a loop" and "using tools based on environmental feedback, in a loop"). The inner reason-act-observe canon is carried by the ReAct paper (Yao et al.) along with the smolagents, LlamaIndex, and Pydantic libraries. The Runner-with-turn-cap object is OpenAI, Google ADK, and Amazon Bedrock. The autonomous goal loop is anchored by AutoGPT, the cross-attempt memory loop by the Reflexion paper, and code self-verification by Voyager. At the maximal, "design-the-loop" end stand the practitioners: Peter Steinberger, Addy Osmani, Geoffrey Huntley (Ralph), and HumanLayer. Two voices reject the word "loop" itself: swyx (the IMPACT model) and an academic survey of autonomous-agent architectures.

There is one thing the audit has to say honestly: the literal beginner, grassroots community voice is absent here. I searched Reddit and Hacker News for queries along the lines of "an agent is just a while loop," including r/AI_Agents and r/LocalLLaMA — and not a single verifiable thread survived. Every "grassroots" source in this set is a named practitioner with a post, a blog, or a recording. So the "grassroots vs. formal" contrast is really a "practitioner-influencers vs. providers and academia" contrast, not "beginners vs. experts." Worth knowing before you treat this map as the voice of the whole field.

The shared core: what everyone holds in common

Strip the disagreements away and a core remains, present in nearly every source that genuinely describes an agent loop — and it is exactly the skeleton the guide teaches.

The model as decider. Every source describing an agent loop puts a language model at its center, driving each lap. Remove it and there is no loop — there is a script.
Repetition (the loop itself). By definition: the model is called many times, not once. The practitioners put it most literally — the bare shell loop while :; do … done.
State or feedback carried into the next lap. The result of one step returns to the context so the next is smarter. The form varies (an observation, a tool result, a reflection in episodic memory, progress files in git) — what is invariant is that something carries forward, not that every action yields environmental feedback.
A goal the loop moves toward. Nearly every source names a goal: a finished task, a defined objective, a success criterion, a "pass" verdict. Even the most maximalist practitioner loops hold one.

One caveat, because it decides how you build: a hard model-side stop is not part of the core. Maximalist practitioner loops often have no model-side terminator at all and run until an operator or a schedule stops them. The goal is invariant; the model stopping itself is not. You add the brake.

This core — a model that decides, acts, checks the result, and repeats toward a clearly set goal until it is done or until you stop it — is the simple, buildable version from the companion guide. The map shows how differently the field talks about it; the guide shows how to build it. If you are starting out, keep coming back to that one version — the rest is execution.