Skip to content

Streaming

When you set "stream": true in your request, Nodexa responds with a Server-Sent Events (SSE) stream. Each event is delivered as tokens are generated, allowing you to display responses progressively.


Connecting to the Stream

Set stream: true in the request body. The response will have Content-Type: text/event-stream.

curl https://your-admin.example.com/v1/responses \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "model": "YOUR_ASSISTANT_ID",
    "input": "Tell me about the history of computing.",
    "stream": true
  }'
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://your-admin.example.com/v1',
  apiKey: 'YOUR_API_KEY',
});

const stream = await client.responses.create({
  model: 'YOUR_ASSISTANT_ID',
  input: 'Tell me about the history of computing.',
  stream: true,
});

for await (const event of stream) {
  switch (event.type) {
    case 'response.output_text.delta':
      process.stdout.write(event.delta);
      break;
    case 'response.completed':
      console.log('\n--- done ---');
      console.log('Response ID:', event.response.id);
      break;
  }
}
from openai import OpenAI

client = OpenAI(
    base_url="https://your-admin.example.com/v1",
    api_key="YOUR_API_KEY",
)

with client.responses.stream(
    model="YOUR_ASSISTANT_ID",
    input="Tell me about the history of computing.",
) as stream:
    for event in stream:
        if event.type == "response.output_text.delta":
            print(event.delta, end="", flush=True)
print()
const response = await fetch('https://your-admin.example.com/v1/responses', {
  method: 'POST',
  headers: {
    'x-api-key': 'YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'YOUR_ASSISTANT_ID',
    input: 'Tell me about the history of computing.',
    stream: true,
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n');

  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = line.slice(6);
      if (data === '[DONE]') break;

      try {
        const event = JSON.parse(data);
        if (event.type === 'response.output_text.delta') {
          process.stdout.write(event.delta);
        }
      } catch {
        // ignore parse errors on empty lines
      }
    }
  }
}

SSE Wire Format

Each event is sent as:

event: <event-type>
data: <json-payload>

(Note the blank line between events.)

The stream ends with:

data: [DONE]

Example stream

event: response.created
data: {"type":"response.created","response":{"id":"resp_01234567-89ab-cdef-0123-456789abcdef","status":"in_progress"}}

event: response.output_item.added
data: {"type":"response.output_item.added","item":{"type":"message","role":"assistant","content":[]}}

event: response.content_part.added
data: {"type":"response.content_part.added","part":{"type":"output_text","text":""}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"The"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":" history"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":" of"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":" computing..."}

event: response.output_text.done
data: {"type":"response.output_text.done","text":"The history of computing..."}

event: response.output_item.done
data: {"type":"response.output_item.done","item":{"type":"message","role":"assistant","content":[{"type":"output_text","text":"The history of computing..."}]}}

event: response.completed
data: {"type":"response.completed","response":{"id":"resp_01234567-89ab-cdef-0123-456789abcdef","status":"completed","output_text":"The history of computing..."}}

data: [DONE]

Heartbeat

To prevent idle connections from being dropped by proxies, load balancers, and CDN layers, Nodexa sends a heartbeat comment every 15 seconds when no data events have been emitted. SSE comments start with : and are ignored by standard SSE parsers.

: heartbeat

Why this matters

If an assistant is processing a complex request (tool calls, retrieval, etc.), there may be a gap of many seconds before the first token is emitted. Without a heartbeat, some intermediaries (Nginx, CloudFront, Cloudflare) may close the connection with a 504 or similar timeout. The 15-second heartbeat keeps the connection alive during these gaps.

Most SSE client libraries handle comments transparently — you don't need to do anything special. If you're parsing the raw stream manually, skip lines that start with :.


Event Reference

response.created

Emitted immediately when the platform begins processing the request. Use this to show a loading indicator.

{
  "type": "response.created",
  "response": {
    "id": "resp_01234567-89ab-cdef-0123-456789abcdef",
    "status": "in_progress"
  }
}

response.status

Emitted when the internal processing status changes (e.g., when routing to a specialist agent, loading tools, etc.).

{
  "type": "response.status",
  "status": "processing"
}

response.content_part.added

Emitted when a new content part is opened within an output item, just before the first response.output_text.delta for that part.

{
  "type": "response.content_part.added",
  "part": { "type": "output_text", "text": "" }
}

response.output_text.delta

Emitted for each text token as the assistant generates its response. Concatenate all delta values to reconstruct the full response.

{
  "type": "response.output_text.delta",
  "delta": " computing"
}

response.output_text.done

Emitted once after all response.output_text.delta events for a content part, confirming the fully assembled text.

{
  "type": "response.output_text.done",
  "text": "The fully assembled response text."
}

response.reasoning_summary_text.delta

Emitted for reasoning model summaries. Some models (e.g., OpenAI o-series) produce a reasoning trace before the final answer. These deltas contain the reasoning summary tokens.

{
  "type": "response.reasoning_summary_text.delta",
  "delta": "The user wants to know about..."
}

Reasoning models only

This event is only emitted for models that support visible reasoning summaries. For standard models, you will not see this event.


response.function_call_arguments.delta

Emitted as the assistant streams the JSON arguments for a function call. You can use these to show a "thinking" or "calling tool" indicator.

{
  "type": "response.function_call_arguments.delta",
  "delta": "{\"location\": \"San"
}

response.function_call_arguments.done

Emitted when a function call's arguments are fully assembled. This signals that you should execute the function and send a follow-up request.

{
  "type": "response.function_call_arguments.done",
  "name": "get_weather",
  "call_id": "call_abc123",
  "arguments": "{\"location\": \"San Francisco\", \"unit\": \"celsius\"}"
}
Field Type Description
name string The function name to invoke
call_id string Unique ID — include this in your function_call_output
arguments string JSON-encoded arguments string

See Function Calling for the complete flow.


response.output_item.added

Emitted when the assistant adds a new item to its output array. This is used for structured items like handover notifications and OAuth prompts.

{
  "type": "response.output_item.added",
  "item": {
    "type": "message",
    "role": "assistant",
    "content": []
  }
}

Handover item:

{
  "type": "response.output_item.added",
  "item": {
    "type": "handover",
    "from_specialist": "General Assistant",
    "to_specialist": "Billing Specialist",
    "reason": "User is asking about invoice details"
  }
}

OAuth required item:

{
  "type": "response.output_item.added",
  "item": {
    "type": "oauth_required",
    "plugin_id": "plugin_abc123",
    "plugin_name": "Google Calendar",
    "provider_id": "google",
    "required_scopes": ["https://www.googleapis.com/auth/calendar.readonly"],
    "auth_url": "https://your-admin.example.com/oauth/google/authorize?state=abc123"
  }
}

response.output_item.done

Emitted when an output item is complete (after all its delta events).

{
  "type": "response.output_item.done",
  "item": {
    "type": "message",
    "role": "assistant",
    "content": [
      {
        "type": "output_text",
        "text": "The full assembled response text."
      }
    ]
  }
}

response.web_search_call.in_progress

Emitted when a web search tool call starts.

{
  "type": "response.web_search_call.in_progress",
  "call_id": "ws_call_abc123"
}

response.web_search_call.searching

Emitted when the search query has been submitted and results are being fetched.

{
  "type": "response.web_search_call.searching",
  "call_id": "ws_call_abc123",
  "query": "latest news on AI regulations 2024"
}

response.web_search_call.completed

Emitted when web search results are available and the assistant begins incorporating them.

{
  "type": "response.web_search_call.completed",
  "call_id": "ws_call_abc123",
  "results_count": 5
}

response.completed

Emitted when the full response is ready. The response object contains the same structure as a non-streaming response body.

{
  "type": "response.completed",
  "response": {
    "id": "resp_01234567-89ab-cdef-0123-456789abcdef",
    "object": "response",
    "status": "completed",
    "model": "asst_01234567-89ab-cdef-0123-456789abcdef",
    "output_text": "The full assembled response text.",
    "output": [...],
    "created_at": 1700000000
  }
}

Saving the response ID

Always read the response.id from the response.completed event. You will need it to continue the conversation with previous_response_id.


response.error

Emitted when an error occurs during streaming. After this event, the stream will close.

{
  "type": "response.error",
  "error": {
    "type": "server_error",
    "code": "upstream_timeout",
    "message": "The LLM provider did not respond in time."
  }
}

See Errors for all error codes.


Handling requires_action in Streams

When the assistant calls a client-side function tool, the stream terminates with status: "requires_action" instead of "completed". The response.completed event's response.status field will be "requires_action".

After executing the function:

  1. Read call_id and arguments from response.function_call_arguments.done
  2. Execute the function locally
  3. Send a new request with previous_response_id set to the current response ID
  4. Include the tool result in input as a function_call_output item

See Function Calling for the complete flow with code examples.


Full SSE Event Table

See Reference — SSE Events for a complete table of all events, their fields, and descriptions.