Building a Streaming AI Chat UI with AI SDK v5 and React

Technical PM + Software Engineer
Many AI chat demos gloss over the hardest part: streaming partial model outputs into a stable chat UI, wiring tool invocations into the same conversation, and preserving a robust message history. This article provides a practical, implementation-forward walkthrough of a React 19 chat component using AI SDK v5, centered on the useChat hook. You'll get concrete patterns for streaming updates, optimistic local echoes, cancellation, message pruning, and a tool-call UI so the chat can invoke and display external tools (search, code execution, DB queries).
1) Architecture and core concepts
Before coding, decide the responsibilities of each layer. Keep the UI declarative and let the SDK manage protocol-level streaming. Key concepts:
useChat: the SDK hook that opens a streaming session and emits partial deltas and final responses.
Message model: a stable message shape with id, role (user/assistant/tool), content (text and optional structured payload), status (pending/streaming/done/error), and metadata (timestamps, tokens).
Tool calls: messages where the assistant requests a tool run. These should appear inline in history and trigger a separate UI to execute and display results.
Cancellation: allow the user to stop streaming, which must abort the SDK request and mark the message appropriately.
- Keep message IDs stable (UUIDs or incremental) to append streaming chunks reliably.
- Preserve final/partial state separately to avoid overwriting confirmed final outputs.
- Treat tool outputs as first-class messages with role 'tool' to maintain timeline consistency.
2) Setup: AI SDK v5 + React 19
Add and initialize AI SDK v5 at the app entry. Provide a context/provider if the SDK supports it. Ensure you configure streaming on the client side as required by the SDK.
Minimal initialization sketch (replace placeholders with your app's config): const client = new AiSdkV5Client({ apiKey: process.env.REACT_APP_AI_KEY }); Wrap the app: <AiProvider client={client}> <App /> </AiProvider> This makes the useChat hook available and ensures centralized configuration for retries, timeouts, and streaming options.
Enable CORS and server policies if your SDK uses a proxy or server-based key exchange. For production, route API traffic through your backend or use short-lived credentials.
- Set streaming=true or use streaming-enabled client method per SDK docs.
- Use environment variables for secrets and avoid embedding keys in client code.
- Configure sensible timeouts and max tokens in SDK call options.
3) Integrating useChat and managing streaming deltas
The core of a smooth chat UI is appending streamed deltas to a 'streaming' message while handling finalization events. The SDK's useChat typically exposes a start/stop API and event handlers for partial tokens or chunks.
Implementation pattern (React pseudocode): // message model { id, role, content: '', status: 'streaming' } // call useChat const { start, stop, onDelta, onComplete, onError } = useChat(); // when user sends const msg = { id: uuid(), role: 'user', content: userText, status: 'done' }; appendMessage(msg); appendMessage({ id: uuid(), role: 'assistant', content: '', status: 'streaming' }); start({ input: userText }); // onDelta: append chunk to assistant message onDelta((chunk) => appendChunkToMessage(assistantId, chunk)); // onComplete: mark message.status = 'done' onComplete((finalText, metadata) => finalizeMessage(assistantId, finalText, metadata)); // onError: mark message.status = 'error'
Important implementation details: - Use a stable reference to the assistant message ID so streaming updates append to the right message. - Apply coalescing/throttling when the SDK emits many tiny tokens to avoid excessive re-renders (batch updates every 50–100ms). - Maintain both a 'currentStreamingText' and a 'finalText' to support optimistic edits without losing streamed content.
- Create appendChunkToMessage(msgId, chunk) that performs setState in a batched way.
- Expose a cancel function wired to an AbortController or use SDK's stop.
- Record metadata (usage/tokens) on onComplete for billing/analytics.
4) Building the streaming UI: progressive rendering and controls
A great streaming UI feels instant. Show partial assistant output as it arrives, provide a stop button, and keep scroll behavior consistent.
UI patterns: - Message bubble shows streamed text with a typing cursor while status === 'streaming'. - A small cancel icon appears on the assistant message to abort the stream. - Disable the send button while local optimistic sends are pending or support parallel messages depending on UX.
Implementation notes: // scroll to bottom when new message or chunk appended useEffect(() => { chatRef.current?.scrollTo({ top: chatRef.current.scrollHeight, behavior: 'smooth' }); }, [messages]); // cancel handler const handleCancel = () => { stop(); markMessageAsCancelled(assistantId); }; Rendering should avoid replacing the entire message list on each token. Use key={msg.id} and render content directly from the message object to minimize diffing overhead.
- Batch token updates to 30–100ms windows to reduce CPU work.
- Render a low-overhead typing indicator (CSS animation) rather than re-rendering icons every chunk.
- Use CSS transforms for smooth UI and keep DOM updates minimal.
5) Implementing a tool call UI and workflow
Tool calls let the assistant request external operations (search, run code, query DB). The UI must surface the request and either auto-run or let the user approve execution. Treat tool calls as messages with a well-known structured payload.
Example message shape for tool call: { id: 'msg-123', role: 'assistant', content: 'I can look that up for you', toolRequest: { name: 'webSearch', args: { q: 'planning patterns' } }, status: 'awaiting_tool' } Flow options: - Auto-run: execute tool immediately on receiving toolRequest, append a 'tool' role message with results, then resume assistant streaming if the tool returns content for next turn. - Manual approval: show a button 'Run webSearch' in the assistant bubble; if the user clicks, run the tool and append the results.
Implementation sketch for running a tool: async function runTool(toolRequest, parentMsgId) { markMessageAsRunningTool(parentMsgId); const toolResult = await fetch('/api/tools', { method: 'POST', body: JSON.stringify(toolRequest) }); appendMessage({ id: uuid(), role: 'tool', content: toolResult.text, status: 'done', parent: parentMsgId }); // optionally continue the conversation by feeding toolResult to SDK start({ input: toolResult.text, context: conversationHistory }); } Keep tool results as separate messages so the conversation timeline remains auditable.
- Validate and sanitize tool inputs on the server side before executing.
- Mark tool messages with metadata (source, execution time, success/error) for debugging.
- When appropriate, feed tool outputs back into the model as a system/tool message to continue reasoning.
6) Reliability, scaling, and UX edge cases
Production chat requires robustness: prune old messages, manage memory, and handle concurrent streams. The most common issues are runaway message arrays, frequent re-renders from token-level updates, and race conditions from multiple concurrent requests.
Practical mitigations: - Message pruning: keep a sliding window (e.g., last 40 messages) and summarize or store older history server-side. - Concurrency: disallow multiple simultaneous assistant streams for the same conversation unless you intentionally support parallel responses. Use a queue for user messages if needed.
Error handling and retries: - On transient network errors, surface a retry button on the assistant message that restarts the streaming call using the same conversation context. - For partial streaming timeouts, mark the message 'stalled' and provide the user an option to retry or discard the message.
- Use AbortController to cancel fetch/streaming operations cleanly.
- Implement exponential backoff for any automated retries from the client.
- Log usage and streaming events for telemetry to diagnose production issues.
Conclusion
Building a real streaming chat UI involves more than wiring an SDK call — you need stable message models, streaming delta handling, user controls (cancel/retry), and a clear tool-call pattern to surface external integrations. Using useChat from AI SDK v5 gives you the streaming primitives; layering batched updates, optimistic UI patterns, and a separate tool message type yields a robust, production-ready component that behaves predictably and remains auditable.
Action Checklist
- Prototype: Implement the chat skeleton with a minimal useChat integration and simple streaming append logic.
- Tooling: Add one tool (e.g., web search) as a toolRequest example and build the runTool flow with server-side validation.
- Performance: Profile token update frequency and batch updates to achieve smooth UI without high CPU usage.
- Persistence: Persist conversations server-side and implement message pruning and summarization for long-running sessions.
- Security: Move API keys to your backend or implement a signed short-lived token exchange to avoid exposing secrets in the browser.