Building a Streaming AI Chat UI with AI SDK v5 and React

Technical PM + Software Engineer

Topics:AI SDK v5Chat UITool Integrations

Tech:ReactstreamTextHTTP APIs

Streaming is what makes an AI chat UI feel alive. It is not just lower latency. It is the difference between an instant conversation and a waiting-on-server form submit. For experienced full-stack JavaScript developers, the challenge is less about parsing bytes and more about orchestrating lifecycles: stream transport, message state, and rendering. This article gives a production-grade blueprint for building a streaming AI chat UI with AI SDK v5 and React, including concrete implementation patterns, tradeoffs, pitfalls, and decision criteria.

Why streaming needs an explicit mental model

Treat streaming chat as three coordinated systems, not a single reducer or component.

Transport lifecycle: request, abort, backoff, and repeatable parsing of the stream.
Message lifecycle: optimistic user entries, transient assistant chunks, and atomic settlement.
UI lifecycle: rendering partial content versus persisted messages, scroll behavior, and accessibility announcements.

If you mix these concerns, the first retry, concurrent send, or tool call will scramble your timeline. Explicit boundaries reduce accidental coupling and make edge cases testable.

Decision criteria:

If you must support offline persistence and replay, separate ephemeral stream buffers from durable messages.
If you support concurrent turns from the same client, isolate each one with a unique turn ID and never merge their chunks.

Pitfalls:

Persisting partial chunks leads to duplicate or malformed history on reconnect.
Capturing turn IDs in closures leads to misrouted chunks.

Architecture baseline

A practical architecture for React plus AI SDK v5 includes:

a chat state module that owns durable messages and transient stream metadata
a submit flow that appends an optimistic user message and starts the assistant stream
a transport adapter that parses streaming events and emits chunk metadata
a settlement handler that atomically commits a finished assistant message
a tool-call renderer that treats tools as timeline nodes rather than raw JSON blobs

Keep two categories of data:

durable messages that are persisted and replayable
ephemeral stream state such as partial text, active abort controllers, and buffered tool arguments

Decision criteria:

persist user messages immediately for traceability
persist assistant messages after settlement to avoid duplicates

Tradeoff: immediate persistence of user messages improves analytics, but it means you need clean deduplication when events replay.

Message identity and lifecycle

Every message should have a stable ID and a clear phase.

A simple phase model:

pending: optimistic user message created locally
streaming: assistant output still arriving
settled: final assistant message committed
failed: terminal error with retry affordance

Basic rules:

generate the client ID before network I/O
link each assistant stream to a turn ID or originating user message ID
keep a single source of truth for message identities in the chat store

Pitfalls:

generating IDs only after the server responds creates flicker and retry mismatches
tying IDs to transient local variables breaks under rerenders

A resilient submit flow

A reliable submit flow should be deterministic and idempotent.

Normalize and validate the input.
Create and persist the optimistic user message.
Start the assistant stream with an AbortController.
Create a streaming assistant placeholder tied to the turn.
Accumulate chunks in an ephemeral buffer.
On completion, atomically settle the assistant message.
On error or abort, mark the turn failed and surface retry.

A simplified streaming transport adapter:

export async function streamAssistant({
  url,
  body,
  onChunk,
  signal,
  onDone,
  onError,
}: {
  url: string;
  body: unknown;
  onChunk: (chunk: string) => void;
  signal: AbortSignal;
  onDone: () => void;
  onError: (err: unknown) => void;
}) {
  try {
    const res = await fetch(url, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify(body),
      signal,
    });

    if (!res.body) throw new Error("No stream body");

    const reader = res.body.getReader();
    const decoder = new TextDecoder();
    let done = false;

    while (!done) {
      const { value, done: readerDone } = await reader.read();
      if (value) {
        const text = decoder.decode(value, { stream: true });
        onChunk(text);
      }
      done = readerDone;
    }

    onDone();
  } catch (error) {
    if ((error as Error).name === "AbortError") {
      onError({ type: "aborted" });
    } else {
      onError(error);
    }
  }
}

And a simplified submit flow:

import { v4 as uuid } from "uuid";

async function submitMessage(store: ChatStore, inputText: string) {
  const userId = uuid();
  const turnId = uuid();
  const controller = new AbortController();

  store.dispatch({
    type: "appendUser",
    payload: { id: userId, text: inputText, phase: "pending" },
  });

  store.dispatch({
    type: "startStream",
    payload: { turnId, controller },
  });

  streamAssistant({
    url: "/api/assistant/stream",
    body: { input: inputText, turnId },
    signal: controller.signal,
    onChunk(chunk) {
      store.dispatch({
        type: "appendChunk",
        payload: { turnId, chunk },
      });
    },
    onDone() {
      store.dispatch({
        type: "settleAssistant",
        payload: { turnId },
      });
    },
    onError(error) {
      store.dispatch({
        type: "failTurn",
        payload: { turnId, error },
      });
    },
  });
}

Tradeoff: buffering chunks locally and committing once is safer for history integrity, but it means a reconnect loses partial output unless you intentionally checkpoint it.

Cancellation, timeouts, and duplicate submits

A production chat UI needs explicit behavior here.

Cancellation:

attach one AbortController per active turn
expose cancel in the UI
clear ephemeral state when a turn is canceled

Timeouts:

prefer a soft timeout that leaves the message retryable
avoid silent automatic retries for long-running completions unless the user experience truly benefits

Duplicate-submit protection:

either disable reentrant submit while a turn is active
or allow concurrency, but isolate each turn completely

A simple cancellation handler:

function cancelTurn(store: ChatStore, turnId: string) {
  const meta = store.getTurnMeta(turnId);
  if (!meta?.controller) return;
  meta.controller.abort();
  store.dispatch({ type: "cancelTurn", payload: { turnId } });
}

Pitfalls:

aborting without clearing the ephemeral buffer leaves stale partial text visible
forgetting to release controller references causes leaks

Tools in the timeline

Tool calls should be first-class timeline nodes, not mysterious assistant text.

A useful event model:

tool_call_started
tool_args_buffering
tool_executed
tool_failed

The key rule is to buffer arguments until they are complete and valid. Do not execute tools against partial JSON.

import Ajv from "ajv";

async function handleToolArgumentChunk(turnId: string, toolId: string, chunk: string) {
  appendToToolBuffer(turnId, toolId, chunk);

  const buffered = toolBuffer(turnId, toolId);
  if (!detectJsonComplete(buffered)) return;

  let args: unknown;
  try {
    args = JSON.parse(buffered);
  } catch {
    dispatch({
      type: "toolArgsInvalid",
      payload: { toolId, error: "malformed JSON" },
    });
    return;
  }

  if (!AjvValidate(args)) {
    dispatch({
      type: "toolArgsInvalid",
      payload: { toolId, error: "schema mismatch" },
    });
    return;
  }

  dispatch({ type: "toolExecuteStart", payload: { toolId } });
  const result = await callToolServerSide(toolId, args);
  dispatch({
    type: "toolExecuteComplete",
    payload: { toolId, result },
  });
}

Decision criteria:

client-side validation is great for responsiveness
server-side validation and authorization are still mandatory
tool outputs should render as dedicated cards or rows, not as raw assistant text dumps

React state strategy that scales

A common anti-pattern is keeping both history and stream buffers in local component state. That breaks down quickly.

A better pattern:

centralize state in context, Zustand, Redux, or another store
keep actions small and idempotent
store controllers and transient metadata in refs or an ephemeral registry
memoize message rows by stable ID

A reducer sketch:

function chatReducer(state: ChatState, action: ChatAction): ChatState {
  switch (action.type) {
    case "appendUser":
      return {
        ...state,
        messages: [...state.messages, action.payload],
      };

    case "startStream":
      return {
        ...state,
        streams: {
          ...state.streams,
          [action.payload.turnId]: action.payload,
        },
      };

    case "appendChunk": {
      const stream = state.streams[action.payload.turnId];
      if (!stream) return state;
      return {
        ...state,
        streams: {
          ...state.streams,
          [action.payload.turnId]: {
            ...stream,
            buffer: stream.buffer + action.payload.chunk,
          },
        },
      };
    }

    case "settleAssistant": {
      const stream = state.streams[action.payload.turnId];
      if (!stream) return state;

      const assistant = {
        id: action.payload.assistantId,
        text: stream.buffer,
        phase: "settled",
      };

      const nextStreams = { ...state.streams };
      delete nextStreams[action.payload.turnId];

      return {
        ...state,
        messages: [...state.messages, assistant],
        streams: nextStreams,
      };
    }

    default:
      return state;
  }
}

Tradeoff: a single reducer is easier to reason about, but high-frequency chunk updates can create unnecessary rerenders unless you batch UI work.

UX, performance, and accessibility

Perceived quality matters at least as much as protocol correctness.

Good defaults:

immediate local echo for the user message
subtle assistant typing state before the first token
chunk batching to reduce layout thrash
auto-scroll only when the user is already near the bottom
clear retry and cancel affordances
dedicated visual treatment for tool activity

Performance tips:

batch chunk updates with requestAnimationFrame or micro-batching
virtualize long histories
memoize message rows

Accessibility tips:

announce stream start and completion with polite live regions
make tool cards keyboard navigable
do not steal focus while the stream updates

Pitfalls:

updating on every token can overwhelm assistive tech
aggressive auto-scroll makes the chat frustrating when users review older context

Testing, observability, and security

Testing should cover more than the happy path.

At a minimum:

happy-path streaming
user aborts
timeouts and retries
duplicate submits
out-of-order chunks
partial tool arguments
resume or reload behavior with persisted history

Observability should track:

turn ID
time to first token
time to settled response
token count
tool names used
terminal status

Security rules:

treat tool arguments as untrusted input
validate and authorize server-side
keep dangerous tools off the client
redact sensitive content in logs
rate-limit tool execution

Decision criteria:

if tools mutate state, consider explicit user confirmation or a two-step execute flow
if you need debugging visibility, build redaction into the observability path from the start

What to ship first

If you are building the first serious version of a streaming chat UI, prioritize these pieces first:

stable turn IDs
a dedicated transport adapter
an ephemeral buffer with atomic settlement
cancel and retry affordances
basic observability for time-to-first-token and completion

Those choices solve the most common failures: duplicate assistant messages, scrambled streams, unsafe tool rendering, and unresponsive cancellation.

Streaming without lifecycle discipline is fragile. But when you separate transport, message state, and presentation cleanly, a chat UI built with AI SDK v5 and React can feel fast, coherent, and resilient under real usage.