AI Tool Calls Explained: Giving Your LLM the Ability to Actually Do Things

Technical PM + Software Engineer

Topics:AIDeveloper ExperienceTools

Tech:AI SDK v5function callingZod

Language models that only generate text are interesting; language models that can reliably invoke real-world capabilities are transformative. Tool calls, the explicit mechanism by which an LLM proposes actions and an application authorizes and executes them, are the bridge between conversation and operational behavior. For full-stack JavaScript teams building production systems, implementing tool calls correctly is less about model cleverness and more about secure, observable, and deterministic execution at the edges.

This article walks experienced Node.js and TypeScript engineers through concrete implementation patterns using AI SDK v5 function-calling concepts and Zod runtime schemas, with tradeoffs, pitfalls, and pragmatic decision criteria you can apply immediately.

The runtime loop: model proposes, app authorizes and acts

At runtime the loop is a tight protocol:

The system provides the model with a description of available tools: names, descriptions, and input schemas.
The model returns either a text response or a structured proposal to call a named tool with specific arguments.
The application validates the proposed arguments, checks permissions and policy, executes the tool implementation, and returns a structured result.
The model continues reasoning using that result, may call further tools, or returns a final answer.

The most important principle is simple: the model suggests, your app authorizes and executes. Treat every tool call as an untrusted request crossing your system boundary.

Decision criteria:

If an action can cause side effects or reveal secrets, require explicit authorization and strict validation.
Expose only the minimum tool surface necessary for the current context to limit blast radius.

Pitfall: exposing broad, loosely documented tools lets the model invent arguments and increases the chance of harmful or nonsensical side effects.

Designing precise tool schemas with Zod

The schema is your strongest control lever. If the tool input shape is ambiguous, the model will guess. Zod gives you both compile-time and runtime guarantees and is a natural fit for Node.js applications.

Good schema design practices:

Favor required fields for essential data.
Use enums for constrained choices.
Constrain strings with regex or length limits when semantics matter.
Bound numeric ranges for quantities or timeouts.
Keep each tool narrow and purpose-specific.

Example: a payments createCharge tool schema.

import { z } from "zod";

export const createChargeSchema = z.object({
  customerId: z.string().min(1).regex(/^cus_[A-Za-z0-9_-]{8,}$/),
  amountCents: z.number().int().min(50).max(500000),
  currency: z.enum(["usd", "eur", "gbp"]),
  idempotencyKey: z.string().min(8).max(128).optional(),
  description: z.string().max(256).optional(),
  capture: z.boolean().default(true),
});

export type CreateChargeArgs = z.infer<typeof createChargeSchema>;

Tradeoff: very strict schemas reduce model flexibility and can increase the chance that the model fails validation. But they also sharply reduce production errors. If a model struggles with a strict schema, loosen it only after studying the failure mode.

Pitfall: using free-text fields for critical IDs or permissions. Model identifiers as structured fields whenever you can.

Validating and executing tool calls safely

Never forward model output directly to business logic. Validate first, then authorize, then execute. When validation fails, return a structured error the model can understand and potentially correct.

Pattern:

Validate with Zod.
Map validation failures to a structured, model-friendly error object.
If authorization fails, return an explicit permission denial.
Execute the tool in a controlled environment with timeouts and restricted privileges.
Return compact, structured results.

import { AI } from "ai-sdk-v5";
import { createChargeSchema, CreateChargeArgs } from "./schemas";
import { executePayment } from "./payments";

const ai = new AI({ apiKey: process.env.AI_KEY });

async function handleModelResponse(modelResponse: any, userContext: any) {
  if (modelResponse.toolCall?.name === "createCharge") {
    const rawArgs = modelResponse.toolCall.arguments;
    const parseRes = createChargeSchema.safeParse(rawArgs);

    if (!parseRes.success) {
      return {
        type: "validation_error",
        errors: parseRes.error.format(),
      };
    }

    const args: CreateChargeArgs = parseRes.data;

    if (!isAuthorized(userContext, args.customerId, "charge:create")) {
      return {
        type: "permission_denied",
        reason: "user lacks billing scope",
      };
    }

    const result = await executePayment(args);
    return { type: "success", result };
  }

  return { type: "no_op" };
}

Decision criteria: inline validation with schema libraries is mandatory for any production tool. Throwing raw exceptions is not enough because the model needs structured failure reasons if it is going to self-correct.

Pitfall: returning raw stack traces or internal database errors to the model. Sanitize and map internal failures to stable error codes and short messages.

Permissioning, scope, and deployment boundaries

Tool calls often create side effects and access secrets. Treat them like privileged API endpoints.

Practical controls:

Authenticate the actor and bind them to a real policy identity.
Enforce environment constraints. Some destructive tools should be disabled outside carefully controlled contexts.
Enforce granular scopes and ownership checks.
Support human confirmation for high-risk actions.

Example decision flow:

Model proposes a tool call.
App checks user role and token scopes.
App verifies resource ownership and contextual constraints.
If policy denies, return a structured denial with remediation steps.

Tradeoff: strict policy adds friction, but it prevents expensive mistakes. Use dynamic policy controls such as feature flags or policy-as-code so you can adjust behavior in response to real model traffic.

Pitfall: logging authorization outcomes with sensitive data. Log decision hashes and rule identifiers, not secrets or tokens.

Idempotency and side-effect safety

Because the model or your infrastructure may retry calls, writes must be idempotent or explicitly deduplicated.

Two common strategies:

Idempotency keys: require a client-supplied key for mutating operations and return the same semantic result for duplicate keys.
Transactional deduplication: compute a deterministic signature from normalized arguments and actor identity.

async function executePayment(args) {
  const key = args.idempotencyKey ?? computeSignature(args);
  const existing = await paymentStore.getByIdempotencyKey(key);
  if (existing) return existing.result;

  const created = await paymentGateway.charge({
    customer: args.customerId,
    amount: args.amountCents,
    currency: args.currency,
    description: args.description,
    capture: args.capture,
  });

  await paymentStore.save({ idempotencyKey: key, result: created });
  return created;
}

Decision criteria:

If operations are irreversible and externally visible, require explicit idempotency keys.
For lower-risk operations, deterministic deduplication may be enough.

Pitfall: including timestamps or non-deterministic data in signature computation. That breaks idempotency immediately.

Reliability: timeouts, retries, and graceful degradation

Each tool needs a reliability policy:

a timeout budget appropriate to the user experience
a retry strategy for transient failures
a fallback behavior when execution cannot complete
circuit breakers to protect downstream systems

async function withRetries(fn, opts = { retries: 3, baseMs: 200 }) {
  let attempt = 0;

  while (true) {
    try {
      return await Promise.race([
        fn(),
        timeout(opts.baseMs * Math.pow(2, attempt)),
      ]);
    } catch (err) {
      attempt += 1;
      if (attempt > opts.retries || !isTransient(err)) throw err;
      await sleep(opts.baseMs * Math.pow(2, attempt));
    }
  }
}

Tradeoffs:

Aggressive retries can duplicate side effects unless idempotency is correct.
Very short timeouts keep the UI snappy but may reject legitimate long-running work.

A useful compromise is to hand long-running work to asynchronous jobs and return a job ID to the model.

Pitfall: letting the model infer retry policy from vague error text. Return explicit status metadata so the model can reason about what to do next.

Orchestration patterns and state management

Single-tool calls are easy. Real workflows involve chains, conditional branches, and stateful progress.

Three practical orchestration patterns:

Synchronous chain: the model calls tools in sequence within one session. Good when latency is acceptable.
Asynchronous job orchestration: start a job, return a job ID, and let the model poll or be notified later.
Hybrid confirmation gate: the model prepares a plan or dry-run, and a separate confirm tool performs the real action.

A useful orchestrator pattern:

Step A: model calls prepareChange, which returns a dry-run summary and a proposed change ID.
App returns the dry-run plus explicit risk flags.
Model or user triggers confirmChange with the change ID.

State considerations:

Persist plan state with TTL so dry-run identifiers expire safely.
Use immutable event logs for auditability.

Tradeoffs:

Synchronous orchestration feels simple but ties up resources.
Asynchronous jobs are more robust but add lifecycle complexity.

Pitfall: assuming ephemeral context still exists later. Require stable identifiers for any stateful step.

Observability, testing, and telemetry

Observability is not optional. Models are noisy components, and you need telemetry to debug tool behavior and improve schemas.

Track at least:

tool invoked and tool name hash
redacted raw arguments
validation result and field-level failures
authorization decision and rule ID
execution latency, retries, and timeouts
execution outcome
model round-trip time and token usage

Testing should include:

contract tests for valid, invalid, and edge-case inputs
simulation harnesses that replay malformed tool proposals and denial flows
fuzz tests to discover validation gaps

Pitfall: logging sensitive data. Redact tokens, API keys, PII, and business-sensitive fields aggressively.

Security and privacy baselines

Tool calls expand your attack surface, so bake in security from the beginning.

Use least privilege for every tool.
Keep secrets server-side and never send them to the model.
Maintain audit trails for side-effecting tools.
Rate-limit expensive tools and apply quotas by actor.
Sanitize tool outputs before returning them to the model.

Decision criteria:

High-risk tools such as billing or user-data access should require stronger confirmation or human review.
Read-only tools can often be exposed more broadly, but they still need redaction and monitoring.

Pitfall: relying on the model for sanitization. The application must enforce these rules.

Developer experience and deployment patterns

To scale tool calling across teams, invest in good developer ergonomics.

Practical DX steps:

Centralize tool registration metadata: name, description, schemas, permission scope, timeout, and breaker policy.
Provide helpers so teams inherit validation and telemetry by default.
Offer a local simulation CLI for replaying tool proposals.
Include contract tests in CI to catch schema drift.

type ToolRegistration<Input, Output> = {
  name: string;
  description: string;
  inputSchema: z.ZodType<Input>;
  outputSchema: z.ZodType<Output>;
  handler: (input: Input, ctx: ExecutionContext) => Promise<Output>;
  scopes: string[];
  timeoutMs?: number;
};

Tradeoff: strict registration adds upfront work, but it creates much stronger reliability at scale. Automate the boilerplate rather than weakening the contract.

Pitfall: forgetting backward compatibility for schemas. Version them when needed and keep long-term contracts stable.

What to ship first, and in what order

If you are introducing tool calls in production, use this rollout checklist:

Inventory candidate tools and classify risk.
Define Zod schemas with valid and invalid examples.
Implement validation-first handlers with structured error mapping.
Add strict permission checks and environment gating.
Implement idempotency or deduplication for mutable operations.
Attach timeouts, retries, and circuit breakers.
Instrument telemetry with sensible retention.
Build simulation and contract tests into CI.
Roll out incrementally: read-only tools first, then low-risk writes, then higher-risk tools behind confirmation.

Promotion criteria:

validation failure rates stay low in simulated traffic
authorization failures have clear remediations
observability shows stable latency and no security incidents in canary traffic

Tool calls turn a language model into an operational collaborator. Done well, they let you automate real workflows reliably. Done poorly, they create brittle and risky behavior. The winning pattern is consistent: strict schemas, strong validation, explicit permissions, idempotent writes, and enterprise-grade observability.