Introduction to Agents · Google · Agents Whitepaper Series · May 2026 KB Synthesis
Reading Synthesis · Caladrius Health KB

Introduction to Agents

An agent is not an AI model in a static workflow — it is a complete application that reasons, acts, and observes in a loop to achieve goals. This is the readable synthesis of Google's foundational paper: the anatomy of an agent, a taxonomy of capability, and the production discipline to scale from prototype to an enterprise fleet.

Source Google · Introduction to Agents (54 pp) Authors Blount · Gulli · Saboo · Zimmermann · Vuskovic Updated May 2026 Series Day 1 of 5
"Agents are the natural evolution of Language Models, made useful in software."
01

From Predictive AI to Autonomous Agents

For years AI focused on passive, discrete tasks — answer a question, translate text, generate an image — each demanding constant human direction. We are now in a paradigm shift: from AI that merely predicts or creates content to a new class of software capable of autonomous problem-solving and task execution.

An agent is not an AI model in a static workflow — it is a complete application that makes plans and takes actions to achieve goals.

It fuses a Language Model's ability to reason with the practical ability to act, handling multi-step tasks a model alone cannot. The defining capability is autonomy: agents figure out the next steps toward a goal without a person guiding every turn.

The developer's role changes too. The traditional developer is a bricklayer, defining every logical step. The agent developer is a director: set the scene (instructions, prompts), select the cast (tools, APIs), supply the context (data), then guide an autonomous "actor." An LM's greatest strength — its flexibility, the capacity to do anything — is also the biggest headache: it is hard to compel it to do one thing reliably. What we called "prompt engineering" is now context engineering.

In essence, an agent is a system dedicated to the art of context-window curation: a relentless loop of assembling context, prompting the model, observing the result, and re-assembling context for the next step.

This is the first of a five-part series — a formal guide for moving from proofs-of-concept to production-grade systems. A prototype is easy; ensuring security, quality, and reliability is the real challenge.

02

The Agentic Problem-Solving Process

The short definition: "LMs in a loop with tools to accomplish an objective." The core loop breaks into five fundamental steps.

StepNameWhat happens
1Get the MissionA high-level goal arrives — from a user ("Organize my team's travel") or an automated trigger ("A high-priority ticket arrived").
2Scan the ScenePerceive the environment: read the request, consult memory ("Did I try this before?"), inventory accessible tools (calendars, DBs, APIs).
3Think It ThroughThe reasoning model analyzes Mission vs. Scene and devises a plan — often a chain of reasoning, not a single thought.
4Take ActionThe orchestration layer executes the first concrete step — invoke a tool, call an API, run code, query a DB.
5Observe & IterateObserve the outcome, fold it into context/memory, loop back to Step 3 until the plan is complete.

This "Think, Act, Observe" cycle is managed by the Orchestration Layer, reasoned by the Model, executed by the Tools.

Worked example — a Customer Support Agent. For "Where is my order #12345?" the agent first plans: (1) Identify the order in the internal DB; (2) Track via the carrier's API; (3) Report a clear answer. It executes find_order("12345") → observes record + tracking ZYX987get_shipping_status("ZYX987") → observes "Out for Delivery" → replies "Your order #12345 is 'Out for Delivery'!"

03

The 5-Level Taxonomy of Agentic Systems

The same loop can be scaled in complexity into different classes of agent — each building on the last. Scoping which level you need is a key early architectural decision.

LVL0

Core Reasoning System

The "Brain" alone

An LM in isolation — no tools, memory, or live environment. Deep at explaining concepts and planning approaches, but blind to anything after its training cutoff (can't tell you last night's game score).

LVL1

Connected Problem-Solver

Reasoning + Tools

Connecting external tools makes it a functional agent. It recognizes a real-time need, calls Search / a financial API / a database via RAG, observes, and synthesizes. Interacting with the world is the defining capability.

LVL2

Strategic Problem-Solver

Context engineering

From simple tasks to strategically planning multi-part goals. The emergent skill is curating focused, high-quality context per step — building new, focused queries from prior output. Enables proactive assistance (flight email → calendar).

LVL3

Collaborative Multi-Agent System

A team of specialists

The paradigm shifts from a "super-agent" to a team mirroring a human org; agents treat other agents as tools. A Project Manager agent delegates sub-missions to research / marketing / web-dev specialists. The frontier of workflow automation.

LVL4

Self-Evolving System

Autonomous creation

From delegation to creation: the system identifies gaps in its own capabilities and dynamically builds new tools or agents (meta-reasons → invokes AgentCreator → a new specialist appears on the fly). A truly learning, evolving organization.

04

Core Agent Architecture: Model, Tools, Orchestration

An AI agent is the combination of a Model, Tools, an Orchestration Layer, and runtime services that use the LM in a loop to accomplish a goal.

🧠
The Brain

Model

The reasoning core. Selection dictates cognition, cost, and speed.

🖐️
The Hands

Tools

Connect reasoning to reality — retrieve information and take actions.

⚙️
The Nervous System

Orchestration

Runs the Think-Act-Observe loop; state, memory, design choices.

BRAINModel

The core LM is the reasoning engine. Picking the highest benchmark score is a common path to failure — production success is rarely set by generic academic benchmarks. Instead:

HANDSTools

Tools connect reasoning to reality through a three-part loop: define · invoke · observe.

NERVOUS SYSTEMOrchestration Layer

The engine that runs the Think, Act, Observe loop — the conductor deciding when to reason, which tool acts, and how results inform the next step.

Multi-agent design patterns — a team of specialists beats one super-agent:

PatternUse it forHow it works
CoordinatorDynamic / non-linear tasksA "manager" segments a request, routes sub-tasks to specialists (researcher, writer, coder), then aggregates.
SequentialLinear workflowsA digital assembly line — one agent's output is the next's input.
Iterative RefinementQualityA "generator" creates content; a "critic" evaluates it against standards in a feedback loop.
Human-in-the-LoopHigh-stakes / safetyA deliberate pause to get human approval before a significant action.
05

Agent Deployment & Services

Deployment is the agent's "body and legs" — from a laptop prototype to an always-on server reachable by people and other agents. Production agents need session-history and memory persistence, plus the builder's decisions on logging, data privacy, data residency, and compliance. Two paths:

Quick deploy commands suit early exploration; a secure, production-ready environment requires real investment in CI/CD and automated testing.

06

AgentOps: Structuring the Unpredictable

Traditional unit tests assert output == expected — useless when an agent's response is probabilistic by design. Judging "quality" (did it do all it should, nothing it shouldn't, with proper tone?) usually requires an LM. AgentOps is the disciplined evolution of DevOps and MLOps — turning unpredictability into a managed, measurable, reliable feature.

07

Agent Interoperability

High-quality agents must interconnect — the "face" of the agent. Note: agents are not tools.

08

Security, Identity & Governance

Securing a single agent — the trust trade-off

Utility requires power; every ounce of power adds risk — primarily rogue actions and sensitive-data disclosure. Give the agent a leash long enough to do its job, short enough to keep it out of traffic. You can't trust the model's judgment alone (it's vulnerable to prompt injection). Use defense-in-depth: (1) deterministic guardrails — hardcoded rules outside the model (block any purchase over $100); (2) reasoning-based defenses — small "guard models" that inspect a proposed plan before execution.

Agent Identity — a new class of principal

Agents are a third principal beyond humans and services — autonomous actors needing their own verifiable "digital passport", distinct from the user who invoked them and the developer who built them. With a cryptographic identity (e.g. SPIFFE) they get least-privilege permissions, containing the blast radius if one is compromised.

PrincipalAuthentication / VerificationNotes
UsersOAuth or SSOHuman actors, full autonomy and responsibility.
Agents (new)Verified with SPIFFEDelegated authority — act on behalf of users.
Service accountsIntegrated into IAMApps/containers, deterministic, not responsible for actions.

Policy = authorization (AuthZ), distinct from authentication (AuthN). Apply least privilege while staying contextually relevant.

Securing an ADK agent

A layered exercise: define identities → enforce access policies at the API-governance layer → build in-tool guardrails that refuse unsafe actions regardless of LM reasoning → add dynamic defenses (ADK Callbacks & Plugins; a before_tool_callback; a "Gemini as a Judge" screen). The Agent Gateway is "air traffic control," natively enforcing Model Armor (prompt injection, jailbreaks, PII leakage, malicious URLs).

Scaling to an enterprise fleet

One or two agents → a security problem. Hundreds → an architecture problem ("agent sprawl").

09

How Agents Evolve & Learn

In dynamic environments, performance "ages" and decays; manually updating a fleet is slow and uneconomical. The scalable answer: agents that learn and evolve autonomously.

Worked example — learning compliance guidelines. A multi-agent loop: a Querying Agent fetches data → a Reporting Agent drafts → a Critiquing Agent reviews against rules and escalates ambiguity to a human → a Learning Agent generalizes the expert's correction into a new reusable guideline the Critiquing Agent applies automatically next time.

Simulation & Agent Gym — the next frontier. Beyond in-line learning, a dedicated Agent Gym optimizes the system offline: (1) not in the execution path; (2) a simulation environment for trial-and-error; (3) synthetic data generators for realistic pressure-testing (red-teaming, critiquing-agent families); (4) a non-fixed optimization arsenal that can adopt or craft tools; (5) a bridge to human experts for "tribal knowledge" edge cases.

10

Advanced Examples

Google Co-Scientist

A virtual research collaborator that accelerates discovery by systematically exploring complex problem spaces. A researcher defines a goal and grounds the agent in chosen knowledge; the system generates and evaluates a landscape of novel hypotheses. It spawns a whole ecosystem of agents: a "Supervisor" acts as project manager, delegating to specialists and distributing compute. Agents run for hours or days, with loops and meta-loops improving both the hypotheses and the way ideas are judged and created.

AlphaEvolve

An AI agent that discovers and optimizes algorithms, pairing Gemini's code generation with an automated evaluator in an evolutionary loop: generate → score → use the best as inspiration for the next generation. Breakthroughs include more efficient data centers / chip design / AI training, faster matrix multiplication, and new solutions to open math problems. It excels where verifying a solution is far easier than finding it, and is built for human–AI partnership: transparent human-readable code and expert guidance that steers exploration and prevents loophole-exploitation.

11

Conclusion

Agents are the natural evolution of language models, made useful in software.

Generative AI agents shift AI from a passive content tool to an active, autonomous problem-solving partner. The anatomy is three parts — the reasoning Model (Brain), actionable Tools (Hands), and the governing Orchestration Layer (Nervous System) — integrated in a continuous Think, Act, Observe loop. The 5-level taxonomy lets architects scope ambition to the task.

The central shift is in the developer paradigm: no longer bricklayers defining explicit logic, but architects and directors who guide, constrain, and debug an autonomous entity. The flexibility that makes LMs powerful is also the source of their unreliability — so success lives not in the initial prompt but in engineering rigor across the whole system: robust tool contracts, resilient error handling, sophisticated context management, and comprehensive evaluation. Applied with discipline, these patterns build not mere "workflow automation" but collaborative, capable, adaptable new members of the team.