Large language models (LLMs) have transformed how humans interact with machines. However, the real breakthrough comes when these models stop being passive responders and start becoming active problem solvers. This transition is powered by a capability known as tool calling.
In this blog, we will explore what tool calling is, why it matters, how it works internally, and how it enables agentic AI systems capable of real-world action.
What Is Tool Calling?
At its core, tool calling refers to an AI model’s ability to interact with external tools, APIs, databases, or systems to extend its native capabilities.
Traditional LLMs operate purely on pretrained knowledge. They generate answers based on patterns learned during training. But, this approach has a hard limit:
Tool calling removes this limitation.
With tool calling enabled, an AI system can:
Query live databases
Fetch real-time information (weather, stock prices, system status)
Execute functions or scripts
Trigger workflows and automation
Interact with enterprise systems
This capability is sometimes called function calling, and it is one of the foundational pillars of agentic AI.
Instead of merely answering questions, LLMs with tool calling can decide, act, and iterate—much like a digital agent.
NOTE: Agent is a system with an LLM at its core that is able to make decisions on what actions to take as it works to answer the prompt it received. The most common actions LLM agents can be built to take are: sending text or other media to the user, calling a tool to help answer the user, and calling another agent to help answer the user. Generally speaking, an LLM agent will also have a system prompt explaining what its role is and giving it some rules over when to call tools and/or reply to the user. For most Agents, the control flow can be shown as follows:
![]() |
| source |
Why is Tool Calling Important
Limits if static knowledge: Even the most advanced LLMs are constrained by:
Training data cutoffs
Lack of real-time awareness
Inability to perform live computations
No direct access to user-specific systems
Early models such as GPT-2 were entirely static. They produced impressive text but had no concept of now.
Ask them about today’s weather or current stock prices, and they simply could not answer accurately.
The Need for Real-World Interaction
As AI moved into production systems—finance, healthcare, DevOps, customer support—the need for:
Live data
External computation
User-specific actions
became unavoidable.
This led to the introduction of tool calling, where models are trained to:
Recognize when external help is needed
Select the correct tool
Generate structured requests
Interpret structured responses
Critically, tools often expect strict input schemas, not free-form text. Tool calling ensures model outputs conform to these schemas, making AI-system integration reliable and safe.
How Does Tool Calling Work?
Modern LLMs such as Claude, Llama 3, Mistral, and IBM Granite - all support tool calling, though implementation details may vary.
At a high level, the process involves six steps.
Step 1: Recognizing the Need for a Tool
Imagine a user asks:
“What’s the weather in San Francisco right now?”
The model immediately understands:
This requires real-time data
The answer cannot come from its static training set
Step 2: Selecting the Right Tool
Next, the model chooses the most appropriate tool—perhaps a weather API.
Each tool is described using metadata, including:
Tool (or function) name
Description
Input parameters
Input and output data types
This metadata allows the model to reason about:
Which tool to use
What arguments it must provide
Tool selection is not random—it is a learned decision based on context.
Step 3: Preparing the Arguments (Args)
Once the tool is selected, the model constructs structured arguments (often called args).
For example:
City: San Francisco
Units: Celsius
Timestamp: current
These arguments must strictly match the tool’s expected schema.
To ensure consistency, developers often use templates or structured prompts that guide the model on:
Which tool to call
What arguments to pass
This is where tool calling differs from free-form prompting—it is contract-driven.
Tool Calling + RAG: A Powerful Combination
Tool calling becomes even more effective when combined with Retrieval Augmented Generation (RAG).
With RAG:
The model retrieves relevant structured and unstructured data
Then uses that data to generate a grounded response
Benefits include:
Higher contextual accuracy
Reduced hallucinations
Lower API overhead
Greater flexibility across domains
Unlike rigid tool calls, RAG allows more fluid reasoning by blending retrieved knowledge with generation.
Step 4: Making the API Call
Each tool is backed by an API, documented via:
Endpoints
HTTP methods
Request/response formats
Many APIs require authentication via an API key.
Once arguments are prepared, the model (or orchestration layer) sends an HTTP request to the external system.
Step 5: Receiving and Processing the Response
The external tool returns structured data—commonly in JSON format.
For a weather API, this might include:
Temperature
Humidity
Wind speed
The AI then:
Parses the response
Filters relevant fields
Transforms raw data into a human-friendly explanation
Step 6: Acting or Responding
Finally, the AI either:
Presents the information to the user, or
Confirms an action (e.g., “Your reminder has been scheduled.”)
If the user asks follow-up questions, the model can repeat the cycle with refined parameters—enabling iterative reasoning.
How Do LLMs Call Tools?
For an LLM (Large Language Model) to call a tool, it needs a structured way to specify which tool it wants to use and what arguments to pass. Since an LLM outputs plain text tokens, an external system must parse this output and execute the tool call. This means the LLM should produce structured or semi-structured data consistently.
Different APIs implement this differently, but the concept is the same across platforms. Let’s look at how the OpenAI Chat API handles this.
When using the OpenAI Chat API, you provide a list of tools the LLM can access. Each tool is defined with:
- Name of the tool
- Description of what it does
- Parameters (including type, description, and whether they are required)
Here’s an example tool definition:
{ "type": "function", "function": { "name": "calculate_distance", "description": "Calculate the distance between two cities", "parameters": { "type": "object", "properties": { "city_a": { "type": "string", "description": "Name of the first city, e.g., New York" }, "city_b": { "type": "string", "description": "Name of the second city, e.g., Los Angeles" }, "unit": { "type": "string", "enum": ["kilometers", "miles"] } }, "required": ["city_a", "city_b"] } }}
This JSON would be included in the API call so the LLM knows it can use calculate_distance. If you don’t include it, the LLM won’t know the tool exists.
calculate_distance. If you don’t include it, the LLM won’t know the tool exists.How Does the LLM Decide to Call a Tool?
When the LLM responds, you check the tool_calls property in the response. For example, in Python:
response_message = response.choices[0].messagetool_calls = response_message.tool_calls
tool_calls will contain an array of tools the LLM wants to invoke, along with the arguments. Your system then executes the corresponding function or method with those arguments.This approach allows the LLM to reason about when to use a tool and provide structured arguments, while your application handles the actual execution
When the LLM responds, you check the tool_calls property in the response. For example, in Python:
tool_calls will contain an array of tools the LLM wants to invoke, along with the arguments. Your system then executes the corresponding function or method with those arguments.This approach allows the LLM to reason about when to use a tool and provide structured arguments, while your application handles the actual execution![]() |
| source |
Agent Workflow as described below :
Receive the Query
The agent gets a natural-language request or task from the user or an external system.
Discover Available Tools
It looks up internal metadata or a tool registry to find relevant tools, schemas, and capabilities.
Select and Invoke the Right Tool
The LLM processes the query along with tool metadata (such as function names, input types, and descriptions).
It chooses the most appropriate tool, prepares the input arguments, and generates a structured function call.
Execute the Tool
The agent shell or tool runner runs the selected function and retrieves the output (e.g., API response, database value, or computation result).
Return the Final Response
The LLM incorporates the tool’s result into its prompt and produces a natural-language answer for the user.
=====================================
Key Capabilities :
Receive the Query
The agent gets a natural-language request or task from the user or an external system.Discover Available Tools
It looks up internal metadata or a tool registry to find relevant tools, schemas, and capabilities.Select and Invoke the Right Tool
The LLM processes the query along with tool metadata (such as function names, input types, and descriptions).
It chooses the most appropriate tool, prepares the input arguments, and generates a structured function call.Execute the Tool
The agent shell or tool runner runs the selected function and retrieves the output (e.g., API response, database value, or computation result).Return the Final Response
The LLM incorporates the tool’s result into its prompt and produces a natural-language answer for the user.=====================================
Key Capabilities :
Dynamic Tool Selection
Automatically picks the right tool based on the context of the task.
Schema-Aware Prompting
Supports structured interfaces like OpenAPI, JSON Schema, and AWS function definitions for precise interactions.
Intelligent Output Handling
Interprets results and chains outputs into logical reasoning for complex workflows.
Flexible Execution Modes
Works in both stateless and session-aware environments.
Dynamic Tool Selection
Automatically picks the right tool based on the context of the task.Schema-Aware Prompting
Supports structured interfaces like OpenAPI, JSON Schema, and AWS function definitions for precise interactions.Intelligent Output Handling
Interprets results and chains outputs into logical reasoning for complex workflows.Flexible Execution Modes
Works in both stateless and session-aware environments.
Common Use Cases :
Virtual Assistants with External Data Access
Enhance assistants by connecting them to APIs and real-time data sources.
Financial Calculators and Estimators
Perform dynamic computations and provide accurate projections.
API-Driven Knowledge Workers
Automate tasks that require pulling and processing data from multiple services.
LLM-Powered Integrations
Invoke AWS Lambda, Amazon SageMaker endpoints, and SaaS tools for advanced functionality.
Virtual Assistants with External Data Access
Enhance assistants by connecting them to APIs and real-time data sources.Financial Calculators and Estimators
Perform dynamic computations and provide accurate projections.API-Driven Knowledge Workers
Automate tasks that require pulling and processing data from multiple services.LLM-Powered Integrations
Invoke AWS Lambda, Amazon SageMaker endpoints, and SaaS tools for advanced functionality.
==================================================
LangChain and Tool Calling
LangChain is one of the most widely used frameworks for implementing tool calling.
It provides:
Tool registration
Argument parsing
Context-aware routing
Memory across multiple interactions
Unlike basic tool calling, LangChain can:
Chain multiple tools together
Store previous tool outputs
Enable complex, multi-step agent workflows
For example:
Call a weather API
Use results to trigger a clothing recommendation tool
Generate a final personalized response
This is a practical implementation of agentic AI.
Common Types of Tool Calling Use Cases
While possibilities are endless, most applications fall into a few major categories.
1. Information Retrieval and Search
AI pulls real-time data from:
Web search engines
Financial markets
Academic databases
News sources
Example: Fetching live stock prices or breaking news inside a chatbot.
2. Code Execution and Computation
AI executes:
Mathematical calculations
Simulations
Scripts via Python or engines like Wolfram Alpha
Useful for analytics, engineering, and scientific domains.
3. Process Automation
AI automates workflows by integrating with:
Calendars
Email systems
CRM tools (Salesforce)
Finance platforms (QuickBooks)
This enables AI-driven business operations.
4. Smart Devices and IoT Control
Agentic systems can monitor and control:
Smart homes
Industrial sensors
Robotics platforms
This opens the door to fully autonomous, end-to-end workflows.
Final Thoughts
Tool calling is not just a feature, it is a paradigm shift.
It allows LLMs to:
Know when they don’t know
Reach outside themselves
Act in the real world
Continuously refine outcomes
As AI systems evolve, tool calling will be the foundation that turns language models into true digital agents—capable of reasoning, acting, and collaborating across complex environments.If language is intelligence, tool calling is agency.
--------------------------------------BACKUP INFO-------------------------------
Sample code:
Below is a complete Python example that shows:
- Defining tools
- Making a chat completion request
- Reading
tool_callsand parsing JSON arguments - Executing your local functions
- Returning tool outputs back to the model for a final answer
How It Works
tools: Defines the schema so the model knows what arguments to provide.- First API call: Model decides if it needs the tool and returns
tool_calls. - Parse arguments:
call.function.argumentsis a JSON string →json.loads(). - Execute local function:
get_city_coordinates(city). - Send result back: Add a message with
role="tool"andtool_call_id. - Second API call: Model uses the tool output to generate the final answer.
Key points to notice :
Where do the JSON input args go?
The model returns them intool_calls[i].function.argumentsas a JSON string. You mustjson.loads(...)that string to get a Python dict to call your function.Returning tool outputs back to the model
You send a new message withrole="tool", include thetool_call_idfrom the original call, and put your tool’s output incontent(commonly JSON).Finalization step
After you add the tool result messages, call the model again so it can synthesize a natural language answer using the tool outputs.
NOTE: execute locally = execute in your code. Where your code reaches out is your choice.Local code that calls external systems, code makes a request to a remote DB or API.Your code enqueues a job to a worker or serverless function
- Agent 1 (DataAgent) calculates distance between two cities using tools
- Agent 2 (ReportAgent) formats the result using its own tool
- An orchestrator glues the two agents together
How this works
- Each agent has its own system prompt, tool schema, and Python functions.
- For each agent, we make a first call with
tools=...andtool_choice="auto". - We parse
assistant_msg.tool_calls[i].function.argumentswhich is a JSON string. - We execute the requested local function and return a tool message with
role="tool"andtool_call_id. - We make a second call for the agent to finalize the answer.
- The orchestrator passes the computed JSON payload from Agent 1 to Agent 2, which formats the report via its own tool.
RAG is a powerful technique that combines search and generation to make AI responses accurate, grounded, and up-to-date. Instead of relying only on what a language model learned during training, RAG allows it to pull in fresh, private, or domain-specific data on the fly.
Let’s break it down into four key stages:
1. Indexing – Preparing Your Knowledge Base
Before AI can answer questions using your documents, those documents need to be transformed into a searchable format.
How it works:
- Start with raw content: PDFs, Word files, notes, web pages, etc.
- Extract text: Pull plain text from these sources.
- Chunking: Split long text into smaller, manageable pieces. This matters because LLMs can’t process huge blocks efficiently.
- Vectorization: Convert each chunk into a numerical representation called a vector, which captures the meaning of the text.
- Embedding model: A specialized model performs this text-to-vector conversion.
- Store in a vector database: All vectors are saved in a database optimized for similarity search.
2. Retrieval – Finding Relevant Information
When a user asks a question, the system fetches the most relevant chunks from the indexed data.
Steps:
- User submits a query: Example: “What does the contract say about termination?”
- Convert query to a vector: Using the same embedding model as before.
- Similarity search: Compare the query vector with stored document vectors.
- Return top matches: The system outputs the most relevant text chunks.
3. Augmentation – Building Context for the Model
The retrieved chunks are combined with the user’s question to create a rich, context-aware prompt.
Process:
- Gather relevant chunks.
- Merge them into a clean context block.
- Construct a new prompt that includes:
- The original question
- The retrieved context
- This augmented prompt gives the LLM the background knowledge it needs.
4. Generation – Producing a Grounded Answer
Finally, the enriched prompt is sent to the language model.
What happens:
- The LLM reads both the question and the retrieved context.
- It generates a response based on actual data, not guesses.
- The output is accurate, explainable, and tied to your documents.
Why RAG Matters
- Standard LLMs rely only on their training data.
- RAG enables models to use your private or latest information.
- Updating knowledge is as simple as updating your indexed documents.
This approach is widely used in enterprise search, chatbots, legal document analysis, and customer support systems.
Receive query and fetch memory
- The agent takes the user query.
- It retrieves relevant session memory or long-term memory (for user preferences, past decisions, cached results).
Discover tools via MCP
- The agent searches the MCP Tools Registry for available tools, schemas, and capabilities.
- Examples: OpenAPI specs, JSON Schema, AWS Lambda functions, SageMaker endpoints, SaaS connectors.
Planning, reflection, and tool choice
- The LLM plans steps, decomposes goals, validates arguments, and routes the query.
- Uses self-critique to ensure correct tool selection and schema compliance.
- Planning references memory to avoid redundant calls and to personalize results.
Execute tools and collect observations
- The MCP Tool Runner executes the chosen function.
- The agent receives tool outputs, checks units, ranges, and business rules.
- If errors occur, planning adapts, retries, or switches tools.
Respond and update memory
- The LLM synthesizes a natural-language answer.
- Optionally writes key facts or decisions back to memory for future use.
Role of Memory in Tool Calling
Memory acts as the context backbone for the agent. It ensures that the LLM doesn’t operate in isolation but instead uses relevant historical and contextual data to make better decisions during planning and tool invocation.
Types of Memory
Short-Term Memory (Session Memory)
- Tracks the current conversation flow and intermediate steps.
- Example: If the user asks, “Add a new test case for DAWR,” and later says, “Make it similar to the last one,” short-term memory recalls what “last one” refers to.
- Stored in the agent’s working context (like a conversation buffer).
Long-Term Memory
- Stores persistent knowledge across sessions.
- Example: Past tool calls, user preferences, previous bug reports, or test harness details.
- Typically implemented using vector databases (e.g., Pinecone, Weaviate, FAISS) for semantic search.
- Enables retrieval of relevant references during planning or content generation.
Where Memory Fits in the Flow
Referencing your diagram and updated version:
Step 1 (Receive Query):
Memory is accessed immediately to enrich the query with historical context.
Example: “User often works on Linux RAS components → prioritize related tools.”Step 3 (Planning & Tool Choice):
Memory helps the LLM plan better by recalling previous tool usage patterns, schema details, and user-specific constraints.
Example: “Last time, the user preferred JSON schema-based prompts → use that format.”Step 4 (Tool Execution):
Memory can store execution results for future reuse.
Example: Cache API responses or computed estimates to avoid redundant calls.Step 5 (Response):
Memory updates with new facts, decisions, and tool outputs for long-term learning.
Why Memory Matters
- Personalization: Tailors responses based on user history.
- Efficiency: Avoids repeated tool calls by caching results.
- Accuracy: Provides richer context for reasoning and planning.
- Scalability: Enables complex workflows by chaining past knowledge.
Practical Implementation
- Short-Term: Conversation buffer in the agent shell.
- Long-Term:
- Vector DB for semantic retrieval.
- Store tool metadata, execution logs, and user preferences.
- Use embeddings to link queries with relevant past interactions.
n8n is an open-source workflow automation platform that helps you connect different apps, services, and APIs without writing a lot of custom code. It’s similar to tools like Zapier or Integromat, but with more flexibility and self-hosting options.
Here’s what makes n8n special:
- Visual Workflow Builder: You can create workflows using a drag-and-drop interface.
- Integrations: It supports hundreds of apps and APIs (Slack, GitHub, Google Sheets, etc.).
- Custom Logic: You can add JavaScript code snippets for advanced logic.
- Self-Hosting: Unlike many SaaS automation tools, you can run n8n on your own server for full control.
- Event-Driven Automation: Trigger workflows based on events (e.g., new email, webhook, database update)
1. AI Agent Using Tools
- The agent receives a chat message and plans its actions.
- It can access contacts, send emails, or send invitations using integrated tools.
2. AI Agent Mixing Tools with MCP Servers
- Triggered by another app through a webhook.
- Uses an MCP (Model Context Protocol) server for specialized integrations (e.g., Atlassian).
- Combines ready-to-use tools for other interactions.
3. Agentic Workflow with a Router
- A router acts as a conditional decision-maker.
- The agent routes tasks based on conditions (e.g., if X happens, do Y).
4. AI Agent with a Human in the Loop
- The agent pauses for human approval before proceeding.
- Example: Asking for Slack approval before executing an action.
5. Dynamically Calling Other Agents
- The agent autonomously decides whether to call another AI agent.
- Option 1: Subagent via an AI Agent Tool node.
- Option 2: Subagent or another agent via a Workflow Tool node.
As AI evolves from single models to agentic architectures (systems where multiple AI agents work together), one question becomes critical:
How do these agents talk to each other and coordinate tasks?
That’s where AI Agent Protocols come in. Think of them as the “rules of communication” that allow agents to share information, collaborate, and connect with external tools. Without these protocols, agentic AI would be chaotic and unreliable.
Here are the most important protocols you should know—explained simply:
1. MCP — Model Context Protocol (by Anthropic)
- What it does: Helps AI agents manage context and connect to external tools like Slack, GitHub, or APIs.
- How it works: Uses a client-server setup with JSON-RPC for communication.
- Example: Imagine an AI assistant in Slack that can also pull data from GitHub and update project status automatically.
2. A2A — Agent-to-Agent Protocol (by Google)
- What it does: Allows multiple AI agents to collaborate and share tasks.
- How it works: Agents talk directly (peer-to-peer) or through a central coordinator.
- Example: Two AI agents—one handling API calls and another managing database queries—working together to complete a workflow in Vertex AI.
3. SLIM — Structured Language Interaction Model (by OpenAI)
- What it does: Makes sure agents exchange messages in a structured, predictable way.
- Why it matters: Prevents confusion when agents use tools or execute tasks.
- Example: An agent asking another agent for a tool response in a clear format, so nothing gets lost in translation.
4. ACP — Agent Communication Protocol (by IBM)
- What it does: Handles discovery of helper agents, status updates, and message routing.
- Where it’s used: Large enterprise systems with multiple agents and services.
- Example: Orchestrating a complex workflow where one agent monitors servers, another handles alerts, and a third updates dashboards.
Why This Matters
These protocols are the foundation of agentic AI. They enable:
- Coordination between agents
- Scalability for large systems
- Real-world execution beyond simple prompts
As we move toward multi-agent, autonomous systems, understanding these protocols is essential for building reliable AI solutions.
Pro Tip for Beginners: Start by experimenting with one protocol (like MCP) in a simple project—such as connecting an AI chatbot to an external API. Once you see how communication works, scaling to multi-agent systems becomes much easier.
Conclusion :
LLMs can dynamically decide when to use a tool, pass the right arguments, and incorporate the results into their final response. This approach transforms LLMs from passive text generators into active problem-solvers that can query APIs, run computations, or fetch real-time data.
Understanding this workflow—define tools → let the model decide → parse arguments → execute locally → return results → finalize answer—is key to building powerful AI-driven applications. Whether you’re integrating with APIs, automating workflows, or creating intelligent assistants, tool calling is the foundation for making LLMs truly useful in real-world scenarios.



No comments:
Post a Comment