AI Tool Calls Explained: Giving Your LLM the Ability to Actually Do Things

Technical PM + Software Engineer
Modern LLMs are great at language—but to be useful in production they often need to actually do things: call APIs, create records, send emails, or trigger external jobs. This article walks through the practical bridge from a chat-style system (LLM as conversational partner) to an agent-style system (LLM as planner that invokes tools). You'll get concrete patterns: how to define a tool interface, encode inputs and outputs with Zod for safety, register handlers in AI SDK v5, run a robust plan-execute-observe loop, and add retries, logging, and tests. Expect code-level clarity and steps you can implement today.
1) What is a "tool" and why you should treat it like a typed API
A tool is any externally executed capability your LLM can request. Examples: create_github_issue, query_customer_db, start_ci_pipeline, send_email. Treat each tool like a typed API: have a stable name, a non-ambiguous description, and a schema for its inputs and outputs.
Why types? LLM outputs are probabilistic; you must validate and sanitize everything before executing an action. Types reduce surprises, enable type-safe handler wiring, and let you fail fast when the model proposes malformed calls.
- Tool identity: unique string id (e.g., create_issue)
- Tool description: short human- and model-readable description of intent and constraints
- Input schema: structured type for arguments (required for safe execution)
- Output schema or observation: what the tool returns or what the agent sees after execution
2) Designing tool schemas with Zod
Zod is a lightweight runtime schema library for TypeScript that provides parsing, validation, and inferred TypeScript types. Use it to declare the exact shape the model should produce for a tool call and to validate any input before executing the handler.
Defining schemas gives two immediate benefits: you can instruct the LLM with the exact shape you expect, and you can validate/transform before passing to external systems.
- Example schema for creating an issue (TypeScript + Zod): import { z } from 'zod'; const CreateIssueSchema = z.object({ title: z.string().min(1), body: z.string().optional(), labels: z.array(z.string()).optional() }); export type CreateIssueInput = z.infer<typeof CreateIssueSchema>;
- Validation at runtime: const result = CreateIssueSchema.safeParse(candidate); if (!result.success) { // report parse errors back to the model or user } // use result.data for the handler
- Use descriptive validation messages and constraints to guide the LLM (length limits, allowed enums, formats)
3) Wiring tools into AI SDK v5: register, describe, and handle
AI SDK v5 provides primitives for registering tools that the model can call. The shape looks like: registerTool({ name, description, schema, handler }). The SDK exposes the model with tool-awareness such that the model can produce a structured tool call rather than natural language when appropriate.
Your handler executes the action and returns an observation. Keep handlers focused: they should translate the typed input into API calls, handle errors locally, and return a concise observation payload for the agent loop to ingest.
- Registering a simple tool (pseudo-API): const sdk = new AI({ apiKey: process.env.AI_KEY }); sdk.registerTool({ name: 'create_issue', description: 'Create a GitHub issue in repo X with title and optional body and labels', schema: CreateIssueSchema, handler: async (input) => { // input already validated const resp = await github.createIssue({ title: input.title, body: input.body, labels: input.labels }); return { issue_url: resp.url, id: resp.number }; } });
- Expose the same description and schema to the model as constraints: models follow structured output better when given precise instruction and an example.
4) Executing calls: the plan-execute-observe loop
The core runtime loop that converts chat into action is: Planner -> Execute tool -> Observe -> Feed back to Planner. Implement this as a deterministic loop with a step limit and clear stopping conditions.
On each iteration: ask the model what to do given the conversation and tool results so far. If the model returns a 'final' answer, stop and return. If it returns a tool call, validate the call with Zod, execute the handler, capture the observation, append it to the conversation, and iterate.
This pattern supports multi-step reasoning. For example, the model can call query_customer_db to get a customer id, then call create_issue with enriched data, then confirm with a final message.
- Simplified loop pseudocode: let steps = 0; while (steps < MAX_STEPS) { const response = await sdk.chat({ messages: conversation, tools: registeredTools }); if (response.type === 'final') return response.content; if (response.type === 'tool_call') { const tool = lookup(response.tool_name); const parsed = tool.schema.safeParse(response.args); if (!parsed.success) { // feed validation error back into conversation and retry conversation.push({ role: 'system', content: 'Tool input invalid: ' + JSON.stringify(parsed.error) }); steps++; continue; } const observation = await tool.handler(parsed.data); conversation.push({ role: 'tool', name: tool.name, content: JSON.stringify(observation) }); } steps++; } throw new Error('Max steps exceeded');
- Always store raw and validated inputs, handler outputs, timestamps, and any errors for replay and debugging.
5) Multi-step orchestration patterns and practical concerns
Multi-step tasks introduce state, partial failures, and branching. Implement patterns that keep the loop robust: explicit planning tokens, sticky context, and short horizon retries. Use a planning prompt that asks for a single action per response and forces the model into tool_call or final modes—this reduces hallucination.
Avoid letting the model directly craft API payloads without validation. Always run safeParse, provide clear, actionable validation errors back to the model, and count validation attempts toward the step budget to prevent infinite loops.
- Plan-first pattern: ask the model to produce a plan (sequence of high-level steps) before executing; then iterate on each step. This increases predictability for long workflows.
- Chunking: for large operations, break work into atomic tool calls that each return short observations. Let the model decide the next call after seeing the observation.
- Idempotency: design handlers to be idempotent where possible or return enough metadata so the agent can detect duplicate work (e.g., resource id).
- Step limits and timeouts: enforce MAX_STEPS and timeout per tool execution to recover from stuck handlers.
6) Error handling, retries, observability, and testing
Tool-based agents are distributed systems. Production-readiness requires monitoring, structured logs, metrics, and deterministic tests. Instrument every tool call with request ids, durations, and result statuses.
Retries should be context-aware: for transient network errors, retry with exponential backoff; for validation errors, surface the error to the model to correct; for logical errors (wrong repository), prefer human escalation.
- Observability items: - Log conversation turns, tool inputs/outputs, validation errors - Track per-tool latency and error rates - Emit traces with correlation ids across model and handler calls
- Testing approaches: - Unit test handlers with mocked external APIs - Integration test the loop by mocking model responses to return tool_call sequences - Replay logs to reproduce agent decisions and diagnose failures
- Security practices: sanitize any handler input before using in downstream systems, enforce least privilege for service accounts used by handlers, and rate-limit dangerous operations
Conclusion
Turning an LLM-powered chat into an agent that takes real-world actions is achievable with principled tooling: model-friendly tool descriptions, strict runtime schemas (Zod), explicit handler wiring in AI SDK v5, and a disciplined plan-execute-observe loop. Validate everything, keep tools small and idempotent, and add step limits, retries, and observability so you can operate confidently in production. These patterns let you move from ambiguous prompts to reliable, auditable automation.
Action Checklist
- Inventory possible tools in your system and pick 3 to implement (e.g., query_db, create_ticket, send_notification).
- For each tool, author a Zod schema and a short descriptive prompt describing inputs, constraints, and examples.
- Register tools in AI SDK v5 with a simple handler and a sandboxed external API (mock external calls during development).
- Implement the plan-execute-observe loop with MAX_STEPS and validation feedback. Run integration tests that mock model outputs to exercise failures and retries.
- Add logging, metrics, and a replayable audit trail for each conversation and tool call. Iterate on prompts and schemas based on observed model behavior.