Building a Streaming AI Chat UI with AI SDK v5 and React

Building a Streaming AI Chat UI with AI SDK v5 and React
Brandon Perfetti

Technical PM + Software Engineer

Topics:AI SDK v5Chat UITool Integrations
Tech:ReactstreamTextHTTP APIs

Streaming is what makes an AI chat UI feel alive. It is not just lower latency. It is the difference between an instant conversation and a waiting-on-server form submit. For experienced full-stack JavaScript developers, the challenge is less about parsing bytes and more about orchestrating lifecycles: stream transport, message state, and rendering. This article gives a production-grade blueprint for building a streaming AI chat UI with AI SDK v5 and React, including concrete implementation patterns, tradeoffs, pitfalls, and decision criteria.

Why streaming needs an explicit mental model

Treat streaming chat as three coordinated systems, not a single reducer or component.

  • Transport lifecycle: request, abort, backoff, and repeatable parsing of the stream.
  • Message lifecycle: optimistic user entries, transient assistant chunks, and atomic settlement.
  • UI lifecycle: rendering partial content versus persisted messages, scroll behavior, and accessibility announcements.

If you mix these concerns, the first retry, concurrent send, or tool call will scramble your timeline. Explicit boundaries reduce accidental coupling and make edge cases testable.

Decision criteria:

  • If you must support offline persistence and replay, separate ephemeral stream buffers from durable messages.
  • If you support concurrent turns from the same client, isolate each one with a unique turn ID and never merge their chunks.

Pitfalls:

  • Persisting partial chunks leads to duplicate or malformed history on reconnect.
  • Capturing turn IDs in closures leads to misrouted chunks.

Architecture baseline

A practical architecture for React plus AI SDK v5 includes:

  • a chat state module that owns durable messages and transient stream metadata
  • a submit flow that appends an optimistic user message and starts the assistant stream
  • a transport adapter that parses streaming events and emits chunk metadata
  • a settlement handler that atomically commits a finished assistant message
  • a tool-call renderer that treats tools as timeline nodes rather than raw JSON blobs

Keep two categories of data:

  • durable messages that are persisted and replayable
  • ephemeral stream state such as partial text, active abort controllers, and buffered tool arguments

Decision criteria:

  • persist user messages immediately for traceability
  • persist assistant messages after settlement to avoid duplicates

Tradeoff: immediate persistence of user messages improves analytics, but it means you need clean deduplication when events replay.

Message identity and lifecycle

Every message should have a stable ID and a clear phase.

A simple phase model:

  • pending: optimistic user message created locally
  • streaming: assistant output still arriving
  • settled: final assistant message committed
  • failed: terminal error with retry affordance

Basic rules:

  • generate the client ID before network I/O
  • link each assistant stream to a turn ID or originating user message ID
  • keep a single source of truth for message identities in the chat store

Pitfalls:

  • generating IDs only after the server responds creates flicker and retry mismatches
  • tying IDs to transient local variables breaks under rerenders

A resilient submit flow

A reliable submit flow should be deterministic and idempotent.

  1. Normalize and validate the input.
  2. Create and persist the optimistic user message.
  3. Start the assistant stream with an AbortController.
  4. Create a streaming assistant placeholder tied to the turn.
  5. Accumulate chunks in an ephemeral buffer.
  6. On completion, atomically settle the assistant message.
  7. On error or abort, mark the turn failed and surface retry.

A simplified streaming transport adapter:

export async function streamAssistant({
  url,
  body,
  onChunk,
  signal,
  onDone,
  onError,
}: {
  url: string;
  body: unknown;
  onChunk: (chunk: string) => void;
  signal: AbortSignal;
  onDone: () => void;
  onError: (err: unknown) => void;
}) {
  try {
    const res = await fetch(url, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify(body),
      signal,
    });

    if (!res.body) throw new Error("No stream body");

    const reader = res.body.getReader();
    const decoder = new TextDecoder();
    let done = false;

    while (!done) {
      const { value, done: readerDone } = await reader.read();
      if (value) {
        const text = decoder.decode(value, { stream: true });
        onChunk(text);
      }
      done = readerDone;
    }

    onDone();
  } catch (error) {
    if ((error as Error).name === "AbortError") {
      onError({ type: "aborted" });
    } else {
      onError(error);
    }
  }
}

And a simplified submit flow:

import { v4 as uuid } from "uuid";

async function submitMessage(store: ChatStore, inputText: string) {
  const userId = uuid();
  const turnId = uuid();
  const controller = new AbortController();

  store.dispatch({
    type: "appendUser",
    payload: { id: userId, text: inputText, phase: "pending" },
  });

  store.dispatch({
    type: "startStream",
    payload: { turnId, controller },
  });

  streamAssistant({
    url: "/api/assistant/stream",
    body: { input: inputText, turnId },
    signal: controller.signal,
    onChunk(chunk) {
      store.dispatch({
        type: "appendChunk",
        payload: { turnId, chunk },
      });
    },
    onDone() {
      store.dispatch({
        type: "settleAssistant",
        payload: { turnId },
      });
    },
    onError(error) {
      store.dispatch({
        type: "failTurn",
        payload: { turnId, error },
      });
    },
  });
}

Tradeoff: buffering chunks locally and committing once is safer for history integrity, but it means a reconnect loses partial output unless you intentionally checkpoint it.

Cancellation, timeouts, and duplicate submits

A production chat UI needs explicit behavior here.

Cancellation:

  • attach one AbortController per active turn
  • expose cancel in the UI
  • clear ephemeral state when a turn is canceled

Timeouts:

  • prefer a soft timeout that leaves the message retryable
  • avoid silent automatic retries for long-running completions unless the user experience truly benefits

Duplicate-submit protection:

  • either disable reentrant submit while a turn is active
  • or allow concurrency, but isolate each turn completely

A simple cancellation handler:

function cancelTurn(store: ChatStore, turnId: string) {
  const meta = store.getTurnMeta(turnId);
  if (!meta?.controller) return;
  meta.controller.abort();
  store.dispatch({ type: "cancelTurn", payload: { turnId } });
}

Pitfalls:

  • aborting without clearing the ephemeral buffer leaves stale partial text visible
  • forgetting to release controller references causes leaks

Tools in the timeline

Tool calls should be first-class timeline nodes, not mysterious assistant text.

A useful event model:

  • tool_call_started
  • tool_args_buffering
  • tool_executed
  • tool_failed

The key rule is to buffer arguments until they are complete and valid. Do not execute tools against partial JSON.

import Ajv from "ajv";

async function handleToolArgumentChunk(turnId: string, toolId: string, chunk: string) {
  appendToToolBuffer(turnId, toolId, chunk);

  const buffered = toolBuffer(turnId, toolId);
  if (!detectJsonComplete(buffered)) return;

  let args: unknown;
  try {
    args = JSON.parse(buffered);
  } catch {
    dispatch({
      type: "toolArgsInvalid",
      payload: { toolId, error: "malformed JSON" },
    });
    return;
  }

  if (!AjvValidate(args)) {
    dispatch({
      type: "toolArgsInvalid",
      payload: { toolId, error: "schema mismatch" },
    });
    return;
  }

  dispatch({ type: "toolExecuteStart", payload: { toolId } });
  const result = await callToolServerSide(toolId, args);
  dispatch({
    type: "toolExecuteComplete",
    payload: { toolId, result },
  });
}

Decision criteria:

  • client-side validation is great for responsiveness
  • server-side validation and authorization are still mandatory
  • tool outputs should render as dedicated cards or rows, not as raw assistant text dumps

React state strategy that scales

A common anti-pattern is keeping both history and stream buffers in local component state. That breaks down quickly.

A better pattern:

  • centralize state in context, Zustand, Redux, or another store
  • keep actions small and idempotent
  • store controllers and transient metadata in refs or an ephemeral registry
  • memoize message rows by stable ID

A reducer sketch:

function chatReducer(state: ChatState, action: ChatAction): ChatState {
  switch (action.type) {
    case "appendUser":
      return {
        ...state,
        messages: [...state.messages, action.payload],
      };

    case "startStream":
      return {
        ...state,
        streams: {
          ...state.streams,
          [action.payload.turnId]: action.payload,
        },
      };

    case "appendChunk": {
      const stream = state.streams[action.payload.turnId];
      if (!stream) return state;
      return {
        ...state,
        streams: {
          ...state.streams,
          [action.payload.turnId]: {
            ...stream,
            buffer: stream.buffer + action.payload.chunk,
          },
        },
      };
    }

    case "settleAssistant": {
      const stream = state.streams[action.payload.turnId];
      if (!stream) return state;

      const assistant = {
        id: action.payload.assistantId,
        text: stream.buffer,
        phase: "settled",
      };

      const nextStreams = { ...state.streams };
      delete nextStreams[action.payload.turnId];

      return {
        ...state,
        messages: [...state.messages, assistant],
        streams: nextStreams,
      };
    }

    default:
      return state;
  }
}

Tradeoff: a single reducer is easier to reason about, but high-frequency chunk updates can create unnecessary rerenders unless you batch UI work.

UX, performance, and accessibility

Perceived quality matters at least as much as protocol correctness.

Good defaults:

  • immediate local echo for the user message
  • subtle assistant typing state before the first token
  • chunk batching to reduce layout thrash
  • auto-scroll only when the user is already near the bottom
  • clear retry and cancel affordances
  • dedicated visual treatment for tool activity

Performance tips:

  • batch chunk updates with requestAnimationFrame or micro-batching
  • virtualize long histories
  • memoize message rows

Accessibility tips:

  • announce stream start and completion with polite live regions
  • make tool cards keyboard navigable
  • do not steal focus while the stream updates

Pitfalls:

  • updating on every token can overwhelm assistive tech
  • aggressive auto-scroll makes the chat frustrating when users review older context

Testing, observability, and security

Testing should cover more than the happy path.

At a minimum:

  • happy-path streaming
  • user aborts
  • timeouts and retries
  • duplicate submits
  • out-of-order chunks
  • partial tool arguments
  • resume or reload behavior with persisted history

Observability should track:

  • turn ID
  • time to first token
  • time to settled response
  • token count
  • tool names used
  • terminal status

Security rules:

  • treat tool arguments as untrusted input
  • validate and authorize server-side
  • keep dangerous tools off the client
  • redact sensitive content in logs
  • rate-limit tool execution

Decision criteria:

  • if tools mutate state, consider explicit user confirmation or a two-step execute flow
  • if you need debugging visibility, build redaction into the observability path from the start

What to ship first

If you are building the first serious version of a streaming chat UI, prioritize these pieces first:

  • stable turn IDs
  • a dedicated transport adapter
  • an ephemeral buffer with atomic settlement
  • cancel and retry affordances
  • basic observability for time-to-first-token and completion

Those choices solve the most common failures: duplicate assistant messages, scrambled streams, unsafe tool rendering, and unresponsive cancellation.

Streaming without lifecycle discipline is fragile. But when you separate transport, message state, and presentation cleanly, a chat UI built with AI SDK v5 and React can feel fast, coherent, and resilient under real usage.