Streaming¶
When you set "stream": true in your request, Nodexa responds with a Server-Sent Events (SSE) stream. Each event is delivered as tokens are generated, allowing you to display responses progressively.
Connecting to the Stream¶
Set stream: true in the request body. The response will have Content-Type: text/event-stream.
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://your-admin.example.com/v1',
apiKey: 'YOUR_API_KEY',
});
const stream = await client.responses.create({
model: 'YOUR_ASSISTANT_ID',
input: 'Tell me about the history of computing.',
stream: true,
});
for await (const event of stream) {
switch (event.type) {
case 'response.output_text.delta':
process.stdout.write(event.delta);
break;
case 'response.completed':
console.log('\n--- done ---');
console.log('Response ID:', event.response.id);
break;
}
}
from openai import OpenAI
client = OpenAI(
base_url="https://your-admin.example.com/v1",
api_key="YOUR_API_KEY",
)
with client.responses.stream(
model="YOUR_ASSISTANT_ID",
input="Tell me about the history of computing.",
) as stream:
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)
print()
const response = await fetch('https://your-admin.example.com/v1/responses', {
method: 'POST',
headers: {
'x-api-key': 'YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'YOUR_ASSISTANT_ID',
input: 'Tell me about the history of computing.',
stream: true,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') break;
try {
const event = JSON.parse(data);
if (event.type === 'response.output_text.delta') {
process.stdout.write(event.delta);
}
} catch {
// ignore parse errors on empty lines
}
}
}
}
SSE Wire Format¶
Each event is sent as:
(Note the blank line between events.)
The stream ends with:
Example stream¶
event: response.created
data: {"type":"response.created","response":{"id":"resp_01234567-89ab-cdef-0123-456789abcdef","status":"in_progress"}}
event: response.output_item.added
data: {"type":"response.output_item.added","item":{"type":"message","role":"assistant","content":[]}}
event: response.content_part.added
data: {"type":"response.content_part.added","part":{"type":"output_text","text":""}}
event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"The"}
event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":" history"}
event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":" of"}
event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":" computing..."}
event: response.output_text.done
data: {"type":"response.output_text.done","text":"The history of computing..."}
event: response.output_item.done
data: {"type":"response.output_item.done","item":{"type":"message","role":"assistant","content":[{"type":"output_text","text":"The history of computing..."}]}}
event: response.completed
data: {"type":"response.completed","response":{"id":"resp_01234567-89ab-cdef-0123-456789abcdef","status":"completed","output_text":"The history of computing..."}}
data: [DONE]
Heartbeat¶
To prevent idle connections from being dropped by proxies, load balancers, and CDN layers, Nodexa sends a heartbeat comment every 15 seconds when no data events have been emitted. SSE comments start with : and are ignored by standard SSE parsers.
Why this matters
If an assistant is processing a complex request (tool calls, retrieval, etc.), there may be a gap of many seconds before the first token is emitted. Without a heartbeat, some intermediaries (Nginx, CloudFront, Cloudflare) may close the connection with a 504 or similar timeout. The 15-second heartbeat keeps the connection alive during these gaps.
Most SSE client libraries handle comments transparently — you don't need to do anything special. If you're parsing the raw stream manually, skip lines that start with :.
Event Reference¶
response.created¶
Emitted immediately when the platform begins processing the request. Use this to show a loading indicator.
{
"type": "response.created",
"response": {
"id": "resp_01234567-89ab-cdef-0123-456789abcdef",
"status": "in_progress"
}
}
response.status¶
Emitted when the internal processing status changes (e.g., when routing to a specialist agent, loading tools, etc.).
response.content_part.added¶
Emitted when a new content part is opened within an output item, just before the first response.output_text.delta for that part.
response.output_text.delta¶
Emitted for each text token as the assistant generates its response. Concatenate all delta values to reconstruct the full response.
response.output_text.done¶
Emitted once after all response.output_text.delta events for a content part, confirming the fully assembled text.
response.reasoning_summary_text.delta¶
Emitted for reasoning model summaries. Some models (e.g., OpenAI o-series) produce a reasoning trace before the final answer. These deltas contain the reasoning summary tokens.
Reasoning models only
This event is only emitted for models that support visible reasoning summaries. For standard models, you will not see this event.
response.function_call_arguments.delta¶
Emitted as the assistant streams the JSON arguments for a function call. You can use these to show a "thinking" or "calling tool" indicator.
response.function_call_arguments.done¶
Emitted when a function call's arguments are fully assembled. This signals that you should execute the function and send a follow-up request.
{
"type": "response.function_call_arguments.done",
"name": "get_weather",
"call_id": "call_abc123",
"arguments": "{\"location\": \"San Francisco\", \"unit\": \"celsius\"}"
}
| Field | Type | Description |
|---|---|---|
name |
string |
The function name to invoke |
call_id |
string |
Unique ID — include this in your function_call_output |
arguments |
string |
JSON-encoded arguments string |
See Function Calling for the complete flow.
response.output_item.added¶
Emitted when the assistant adds a new item to its output array. This is used for structured items like handover notifications and OAuth prompts.
{
"type": "response.output_item.added",
"item": {
"type": "message",
"role": "assistant",
"content": []
}
}
Handover item:
{
"type": "response.output_item.added",
"item": {
"type": "handover",
"from_specialist": "General Assistant",
"to_specialist": "Billing Specialist",
"reason": "User is asking about invoice details"
}
}
OAuth required item:
{
"type": "response.output_item.added",
"item": {
"type": "oauth_required",
"plugin_id": "plugin_abc123",
"plugin_name": "Google Calendar",
"provider_id": "google",
"required_scopes": ["https://www.googleapis.com/auth/calendar.readonly"],
"auth_url": "https://your-admin.example.com/oauth/google/authorize?state=abc123"
}
}
response.output_item.done¶
Emitted when an output item is complete (after all its delta events).
{
"type": "response.output_item.done",
"item": {
"type": "message",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "The full assembled response text."
}
]
}
}
response.web_search_call.in_progress¶
Emitted when a web search tool call starts.
response.web_search_call.searching¶
Emitted when the search query has been submitted and results are being fetched.
{
"type": "response.web_search_call.searching",
"call_id": "ws_call_abc123",
"query": "latest news on AI regulations 2024"
}
response.web_search_call.completed¶
Emitted when web search results are available and the assistant begins incorporating them.
response.completed¶
Emitted when the full response is ready. The response object contains the same structure as a non-streaming response body.
{
"type": "response.completed",
"response": {
"id": "resp_01234567-89ab-cdef-0123-456789abcdef",
"object": "response",
"status": "completed",
"model": "asst_01234567-89ab-cdef-0123-456789abcdef",
"output_text": "The full assembled response text.",
"output": [...],
"created_at": 1700000000
}
}
Saving the response ID
Always read the response.id from the response.completed event. You will need it to continue the conversation with previous_response_id.
response.error¶
Emitted when an error occurs during streaming. After this event, the stream will close.
{
"type": "response.error",
"error": {
"type": "server_error",
"code": "upstream_timeout",
"message": "The LLM provider did not respond in time."
}
}
See Errors for all error codes.
Handling requires_action in Streams¶
When the assistant calls a client-side function tool, the stream terminates with status: "requires_action" instead of "completed". The response.completed event's response.status field will be "requires_action".
After executing the function:
- Read
call_idandargumentsfromresponse.function_call_arguments.done - Execute the function locally
- Send a new request with
previous_response_idset to the current response ID - Include the tool result in
inputas afunction_call_outputitem
See Function Calling for the complete flow with code examples.
Full SSE Event Table¶
See Reference — SSE Events for a complete table of all events, their fields, and descriptions.