Remote OpenClaw

Remote OpenClaw Blog

AI Agent Tool Calling: How Agents Use APIs and External Tools

9 min read ·

Tool calling is the mechanism that allows AI agents to go beyond generating text and actually interact with external systems — APIs, databases, file systems, and web services. It works by having the LLM output structured JSON specifying a function name and arguments, which a runtime layer executes and returns the result to the model for further reasoning.

As of April 2026, tool calling is supported by every major LLM provider including OpenAI, Anthropic, and Google. It is the foundation of what makes an AI agent different from a chatbot: the ability to take actions, not just produce words.

What Is Tool Calling?

Tool calling is a capability built into modern LLMs that allows the model to request the execution of external functions during a conversation. Instead of responding with plain text, the model outputs a structured JSON object containing a function name and its arguments, signaling to the runtime that an action should be taken.

The concept emerged publicly when OpenAI launched function calling in its API in June 2023. Since then, Anthropic, Google, Mistral, and most other LLM providers have implemented their own versions. The terminology varies — OpenAI calls it "function calling," Anthropic calls it "tool use," and some frameworks use "actions" — but the underlying mechanism is the same.

Without tool calling, an AI agent is limited to generating text. It can describe what it would do, but it cannot actually do it. Tool calling bridges this gap by giving the model a structured way to say "I need to call this function with these parameters" and receive real results back. This is what enables agents to send emails, query databases, search the web, create files, and interact with third-party APIs.


How Tool Calling Works: The Execution Loop

Tool calling follows a four-step loop that repeats until the agent has enough information to respond. The LLM never executes code directly — a separate runtime layer handles all execution.

Step 1: Tool Definition. Before the conversation starts, the developer defines a set of available tools using a JSON schema. Each tool has a name, description, and parameter definitions. For example, a weather tool might be defined with a location parameter of type string.

Step 2: LLM Generates a Tool Call. During the conversation, when the model determines it needs external information, it outputs a structured JSON object instead of text. This object contains the tool name and arguments — for example: {"name": "get_weather", "arguments": {"location": "San Diego"}}.

Step 3: Runtime Executes. The runtime layer (your application code, or an agent framework like OpenClaw) receives the JSON, validates it against the tool schema, executes the function, and captures the result.

Step 4: Result Fed Back. The execution result is added to the conversation as a tool response message. The model then uses this result to either generate a final text response or make another tool call if more information is needed.

This loop can repeat multiple times in a single turn. A complex query like "find the cheapest flight from LA to Tokyo next week" might trigger a sequence of tool calls: searching flights, comparing prices, and checking seat availability.


Common Tool Types and Use Cases

AI agents use tools across a wide range of categories, each with different risk profiles and implementation complexity. The table below covers the most common types as of April 2026.

Tool TypeDescriptionExample Use CaseRisk Level
Web SearchQuery search engines for real-time informationFinding current pricing, news, documentationLow
Database QueryRead from or write to SQL/NoSQL databasesLooking up customer records, inventory countsMedium
File OperationsRead, write, or modify files on diskGenerating reports, updating config filesMedium
API CallsSend HTTP requests to third-party servicesCreating a Jira ticket, sending a Slack messageMedium–High
Code ExecutionRun code in a sandboxed environmentData analysis, chart generation, calculationsHigh
Email/MessagingSend emails or messages on behalf of the userFollow-up emails, Telegram notificationsHigh
Payment/FinancialProcess transactions or financial operationsInvoice creation, payment processingCritical

Risk level matters because it determines what safety controls you need. Low-risk tools like web search can typically run without approval. High-risk and critical tools should require explicit human confirmation before execution — a pattern known as "human-in-the-loop" or, in OpenClaw, exec approvals.


Tool Calling Across LLM Providers

Every major LLM provider now supports tool calling, though the implementation details and naming conventions differ. Understanding these differences matters if you are building agents that need to work across multiple models.

OpenAI was the first to ship function calling in June 2023 and has since evolved it into a mature system supporting parallel tool calls, strict JSON schema validation, and built-in tools like code interpreter and file search. Their function calling documentation is the most widely referenced in the ecosystem.

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Anthropic launched tool use for Claude in April 2024. Their implementation supports streaming tool calls, forced tool use (where you require the model to call a specific tool), and computer use tools for GUI interaction. The Anthropic tool use documentation emphasizes safety patterns including chain-of-thought before tool calls.

Google Gemini supports function calling with parallel execution and automatic function calling modes. Mistral and open-source models like Llama and Qwen also support tool calling, though the reliability varies — smaller models sometimes generate malformed JSON or call the wrong tool.


Tool Calling in OpenClaw: The Skills System

OpenClaw implements tool calling through its skills system, where each skill is a self-contained tool definition written as a Markdown file following the SKILL.md format. Skills define when they should trigger, what parameters they accept, and what actions they perform.

You can install pre-built skills from the ClaWHub marketplace or write custom skills tailored to your workflows. Popular skills include web search via SearXNG, email integration, calendar management, and CRM updates. For a full overview, see our complete skills guide.

What makes OpenClaw's approach distinct is that skills are portable Markdown files rather than compiled code. This means you can inspect exactly what a skill does before installing it, share skills between agents, and version-control them alongside your agent configuration. The ClaWHub installation guide walks through the process.


Safety: Sandboxing, Approvals, and Rate Limiting

Unsupervised tool calling is one of the highest-risk capabilities in AI agent systems. An agent with unrestricted tool access can send unauthorized emails, delete files, make API calls that cost money, or expose sensitive data. Safety controls are not optional.

Sandboxing restricts where and how tools execute. Docker containers, restricted file system access, and network policies prevent a tool from affecting systems outside its intended scope. OpenClaw supports sandbox mode for this purpose.

Approval flows require human confirmation before high-risk tools execute. OpenClaw's exec approvals system lets you flag specific tools or tool categories as requiring approval. When the agent wants to call a flagged tool, it pauses and waits for the operator to confirm.

Rate limiting prevents runaway tool calls. Without limits, a malfunctioning agent loop can make hundreds of API calls in seconds, burning through API quotas or racking up costs. Set per-tool and per-session rate limits.

Audit logging records every tool invocation with timestamps, parameters, and results. This creates an accountability trail and helps debug issues. For a deeper dive into agent security, see our AI agent security risks guide.


Limitations and Tradeoffs

Tool calling is powerful but not without significant limitations that operators should understand before relying on it in production.

Reliability varies by model. Larger models like GPT-5 and Claude Opus 4 handle complex, multi-step tool calling well. Smaller models — especially those under 13B parameters — frequently generate malformed JSON, hallucinate tool names, or pass incorrect argument types. If you are running a local model via Ollama, expect more tool-calling failures than with cloud APIs.

Latency increases with each tool call. Every tool call adds a round trip: the model generates JSON, the runtime executes, and the result is fed back. A sequence of five tool calls can add several seconds to response time compared to a direct text response.

Cost scales with tool usage. Each tool call and its result consume tokens. A single turn with multiple tool calls can use 2-5x more tokens than a text-only response, directly increasing API costs.

Security surface area grows with each tool. Every tool you add is a potential attack vector. Prompt injection can trick agents into calling tools with malicious parameters. Defense requires strict input validation, output sanitization, and principle of least privilege — only give agents the tools they actually need.

When not to use tool calling: Simple Q&A tasks, creative writing, and summarization rarely benefit from tools. Adding tools to tasks that do not need them increases cost, latency, and failure modes without meaningful benefit.


Related Guides


Frequently Asked Questions

What is tool calling in AI agents?

Tool calling is the mechanism that lets an AI agent interact with external systems. Instead of only generating text, the LLM outputs a structured JSON object specifying which function to call and what arguments to pass. A runtime layer then executes that function and feeds the result back to the model for further reasoning.

Is tool calling the same as function calling?

Yes, tool calling and function calling refer to the same capability. OpenAI originally used the term 'function calling' when it launched the feature in June 2023, while Anthropic uses 'tool use.' Both describe the LLM generating structured output to invoke external functions.

Can AI agents call any API automatically?

Not without configuration. An agent can only call tools that have been explicitly defined in its tool schema. Each tool must have a name, description, and parameter schema registered before the agent can invoke it. The agent cannot discover or call arbitrary APIs on its own.

How do you prevent an AI agent from misusing tools?

Common safety measures include sandboxing tool execution, requiring human approval for high-risk actions (like sending emails or making payments), rate limiting tool calls, restricting which tools are available in each context, and logging every tool invocation for audit. OpenClaw supports exec approvals and sandbox mode for this purpose.

Does OpenClaw support tool calling?

Yes. OpenClaw uses a skills system where each skill is a self-contained tool definition. Skills define their trigger conditions, parameters, and execution logic. You can install skills from the ClaWHub marketplace or write custom ones. OpenClaw also supports exec approvals so you can require human confirmation before high-risk tool calls execute.