Agents in AI: My Mid-Flight Discoveries

I recently found myself on a long flight from Amsterdam to Brazil (I’ll share more about that trip in another article). With limited in-flight entertainment and no internet access, I remembered I had saved a fascinating whitepaper from Google about AI “Agents.” Reading it turned out to be the perfect way to make time fly. literally! By the time I landed, I was brimming with excitement about what I’d learned.

In this article, I’ll share my key takeaways, highlight the core concepts of AI agents, and explain why they might be the next big thing in the AI world.

Why Agents Matter in the Age of LLMs

Large Language Models (LLMs), like those from OpenAI and Google, have generated a lot of hype, understandably so. They represent a major leap forward in AI, enabling everything from human-like text generation to complex problem-solving. However, LLMs have some limitations:

Cut-Off Training Data: Many LLMs are “frozen” at a certain point in time, so they can’t incorporate brand-new information on their own.
No Direct Connection to External Tools: They can generate text but lack the built-in ability to take actions in the real world.

Agents aim to solve these issues by acting on behalf of the LLM. They can access external data sources, call APIs, and even make decisions or perform tasks independently, which is why many people believe Agents could be AI’s next frontier.

What Exactly Are AI Agents?

According to Google’s whitepaper, an AI Agent is an application that:

Observes the world (usually through a language model).
Acts on it using various tools at its disposal.
Works autonomously to reach a specific goal.

The Agent doesn’t need human intervention for every little step, it just needs a clear objective. Think of it like a personal assistant with some initiative: give it a goal, and it can figure out how to achieve that goal using the tools and reasoning methods it has learned.

Agents vs. Traditional “Jobs” in Software Engineering

A question might arise here: How do AI Agents differ from the “jobs” we often set up in software engineering? In typical software-development pipelines, we define “jobs” (like cron jobs or batch jobs) that run at specific times or in response to certain triggers. These jobs follow a fixed set of instructions , say, “pull data from X database, update Y table, then email the results to Z.” If any requirement changes or more data is needed, a human developer must update the code or reconfigure that job.

By contrast, AI Agents only need a clear objective, along with access to the right tools. They use the underlying language model’s reasoning to figure out the steps themselves, rather than relying on a predefined script. For example, a standard software job might always process user requests in the same sequence, whether or not each step is still relevant. An AI Agent, on the other hand, can dynamically decide which actions are required. It can search a database, call an API, or even refine its own plan, all in pursuit of the broader goal you gave it , without you having to redefine all the individual steps every time something changes.

Core Elements of an Agent

Google’s whitepaper describes three essential parts of an agent’s “cognitive architecture”:

General agent architecture and components

The Model
This is the language model (LM) that powers the agent’s decision-making. It relies on advanced reasoning methods , like ReAct (“Reason + Act”), Chain-of-Thought, or Tree-of-Thoughts , to plan its next moves. You can choose a large language model (LLM), a smaller one, or even an existing service such as ChatGPT (but remember it has its own API cost).
The Tools
Agents need ways to connect with the outside world: APIs, databases, or web services. For instance, an agent might fetch flight data from an airline’s API or update a user’s calendar. Tools transform a simple language model into a full-fledged agent capable of real-world interactions.
The Orchestration Layer
This layer is the “control center.” It handles the back-and-forth process of :

Taking in new information.
Thinking through possible next steps (reasoning).
Deciding on an action.
Checking whether the goal is reached or if further steps are needed.

Cognitive Architectures: How Agents Think

An agent’s orchestration layer is often guided by “reasoning frameworks”, structures that help the language model think in more systematic ways. Here are three popular ones:

ReAct (Reason + Act)
The model breaks down tasks by explicitly expressing what it’s thinking (the “Reason” part) and what it plans to do next (the “Act” part). It looks like:

Thought: The agent’s internal reasoning.
Action: Deciding which tool to use or what step to take.
Observation: Learning from the tool’s output or feedback.
Final Answer: A complete response to the user.

Chain-of-Thought (CoT)
The agent uses sequential reasoning steps, explaining how it arrived at each step before providing a final answer. This chain of intermediate thoughts can improve the accuracy of more complex queries.
Tree-of-Thoughts (ToT)
Instead of one linear chain of thought, the model explores multiple reasoning “branches,” evaluating different ways to solve a problem. This approach is especially useful for strategic decision-making.

This diagram illustrates how an AI Agent processes a user’s request to book a flight from Austin to Zurich. The “Question” is the user’s prompt; the Agent then thinks about which tool to use (in this case, a “Flights” tool), sends the necessary input (city names, dates, etc.), observes the tool’s results, and finally presents a concise answer back to the user. Each step, Question, Thought, Action, Observation, and Final Answer , happens inside the Agent’s “runtime,” enabling it to act autonomously and leverage multiple tools (like a flight API, a search function, or even a code runner) to fulfill the user’s goal.

Example agent with ReAct reasoning in the orchestration layer

Tools: The Bridge to the Outside World

As was mentioned, a major shortcoming of standard LLMs is their inability to look up fresh data or modify external systems in real-time. Agents overcome this by using “tools” — essentially interfaces to external APIs, databases, or other services. Google’s whitepaper focuses on three main tool categories:

Extensions

Think of an Extension as a universal translator between the agent and an external API — one that also handles a lot of the tricky parts of data extraction and validation. In a flight-booking scenario, for example, you might normally write custom code to parse a user’s request (“I want to book a flight from Austin to Zurich”), extract “Austin” and “Zurich,” then call the correct endpoint with the right parameters. But if the user forgets a key piece of information, like the departure city, your code might break. Extensions solve this by automatically teaching the agent how to interact with the API. The agent itself learns which parameters are needed, how to ask for missing data, and how to handle edge cases. As a result, you don’t have to manually write (and constantly update) custom parsing or validation logic. The Extension takes care of translating the user’s natural-language request into a valid API call — even in situations you hadn’t initially planned for — making your system far more scalable and resilient.

1-to-many relationship between Agents, Extensions and APIs

Functions

Similar to Extensions but run on the client (front-end) side.
The agent proposes which function to call and with what arguments; the client then executes the function (e.g., hitting an API or a database).
Perfect for scenarios where you need extra security or want more control over the data flow.

Sequence diagram showing the lifecycle of a Function Call

Data Stores

Overcome the LLM’s knowledge cut-off by giving the agent access to dynamic, up-to-date information in vector databases.
Often used in Retrieval Augmented Generation (RAG) systems, where the model can query external “memory” (like a specialized database of PDFs or text documents), retrieve relevant info, and incorporate it into its answers.

1-to-many relationship between agents and data stores, which can represent various types of pre-indexed data

Production-Ready Agents with Vertex AI

In the whitepaper, Google highlights Vertex AI, its platform that combines all of these concepts into a managed environment:

Vertex AI Agents let you define goals, instructions, and sub-agents for specialized tasks.
Extensions and Example Stores come pre-built, so you can easily plug in real-world APIs (e.g., searching flights, analyzing code, performing data lookups).
Functions allow you to add custom or secure logic on your own servers.
RAG Pipelines let agents query external data sources in real time.

This approach saves development time, offers built-in security, and reduces the complexity of stitching together different pieces manually.

Looking Ahead

The future of AI Agents looks incredibly promising. As tools get more sophisticated and reasoning techniques improve, agents could handle increasingly complex tasks, from fully automated customer service to orchestrating multi-step business processes. We may even see “agent chaining,” where multiple specialized agents collaborate, each an expert in a specific domain , to deliver more powerful and comprehensive solutions.

However, building advanced agents isn’t a simple, one-time effort. It requires iteration: testing, refining, and revisiting the orchestration layer, the tool definitions, and how the language model is prompted. Every use case might need its own custom approach.

Final Thoughts

My in-flight reading turned out to be far more engaging than scrolling through the usual lineup of airplane movies. Google’s whitepaper clarified what AI agents can do today , and hinted at where they could be headed tomorrow.

They bridge LLMs with the real world.
They orchestrate planning and execution, not just generate text.
They can learn from external tools and data in real time.

For anyone fascinated by AI’s next phase , especially if you’ve wondered how to get past the limitations of LLMs , Agents might be the answer. Whether you’re a developer looking to build sophisticated applications or just someone curious about where AI is heading, keep an eye on this evolving space.

And if you’re going on a long flight soon, definitely consider bringing along something intriguing to read, you might be surprised how much you can learn above the clouds.

Thanks for reading! I hope this summary helped clarify the concept of AI agents. Stay tuned for another article about my actual trip to Brazil, where I put some of these ideas into a real-life travel-planning test!

Search This Blog

Mohammad Ramezani