Chat sessions

The chat endpoint lets Spectron manage the entire LLM conversation loop in a single call. Rather than recording a turn, retrieving context, calling your LLM, and recording the assistant's reply separately, you hand all of that to Spectron and receive the model's reply along with a memory diff in one response.

Two integration patterns

Spectron supports two ways to integrate memory into a conversation:

Pattern	Use when
Spectron drives the loop (`session.chat()`)	You want minimal integration surface. Spectron handles context retrieval, model invocation, and memory persistence.
Caller drives the loop (`session.turn()` + `session.context()`)	You need control over the model call – custom prompts, streaming, tool use, multi-step chains, or your own model infrastructure.

The two patterns are not mutually exclusive. A single session can use chat for simple turns and turn + context for turns that require tool use or custom prompting.

What Spectron does in `chat`

When you call session.chat(), Spectron executes the following steps:

Loads the session transcript window – the last 10 user/assistant turns of the same session, capped at 8k characters of the newest content. Rendered as a sanitised # Conversation so far block inside the system prompt (one line per turn, inside the existing untrusted-data framing). The user message for this request is not yet stored; it appears under # Question after a successful reply.
Retrieves context – runs the tiered retrieval strategy. Turns already in the transcript window are not retrieved again (the prompt already includes them). Older turns from the same session remain eligible. Prior assistant replies rank below user statements and document sources when both match; citations from those hits include role: "assistant".
Calls the response model – invokes the configured response model with retrieved context, the transcript window (when non-empty), and the current question.
Persists both turns – on a successful reply, stores the user message (infer: full) and the assistant reply (infer: none, so the reply is not re-extracted).
Returns the reply text, the memory diff from the user turn's extraction, and a citations list when the model cites retrieved sources with [S1]-style markers.

When the transcript window is non-empty, tier-2 response reuse is bypassed — the cache is keyed by query embedding and caller scope, not session, so transcript-dependent replies must neither be served from nor seeded into the cache. First-turn and sessionless calls remain cache-eligible.

Multi-turn chat pins a session across messages. Under scope enforcement, each turn's session is scope-tagged to the caller's write region so later messages in the same session authorise correctly.

The `chat` endpoint

POST /api/v1/{context_id}/sessions/{session_id}/chat
Content-Type: application/json

{
  "content": "What do you know about me?"
}

Response:

{
  "reply": "Based on what you've shared, you're Alice, CTO at Acme,
    based in Berlin…",
  "memory_updates": {
    "entities": [],
    "attributes": [],
    "relations": [],
    "corrections": []
  },
  "citations": [
    {
      "marker": "[S1]",
      "id": "chunk:…",
      "kind": "chunk",
      "snippet": "Alice is CTO at Acme…",
      "score": 0.82,
      "documentTitle": "Team handbook",
      "positionPercent": 34
    }
  ]
}

`reply`

The plain-text response from the configured response model. This is ready to display directly in your UI. Inline [S1] markers correspond to entries in citations.

`citations`

One entry per cited marker. Each resolves to a source row (id, kind, snippet, score, optional occurredAt, optional role). Document passages also include documentTitle and positionPercent (approximate percentage through the document). When a citation points at a stored assistant reply, role is "assistant" so callers can tell chat is citing its own earlier output. Markers never invent sources — only rows from retrieval (including section-expansion context) are cited.

`memory_updates`

The extraction diff from the user turn (the assistant reply is stored with infer: none). The shape matches the extraction result from session.turn(). Use this to update UI state or trigger any downstream logic that depends on changes to the memory graph.

Python SDK

from surrealdb import Spectron

memory = Spectron(context="acme-prod",
    api_key=os.environ["SPECTRON_API_KEY"])
session = await memory.sessions.create(scope=["org/acme/user/alice"])

reply = await session.chat("What do you know about me?")

print(reply.reply)            # The model's response string
print(reply.memory_updates)   # Entities, attributes, relations,
    corrections

JavaScript SDK

import { Spectron } from "@surrealdb/spectron";

const memory = new Spectron({ context: "acme-prod",
    apiKey: process.env.SPECTRON_API_KEY });
const session = await memory.sessions.create({ scope: ["org/acme/user/alice"] });

const reply = await session.chat({ message: "What do you know about me?" });

console.log(reply.reply);          // The model's response string
console.log(reply.memoryUpdates);  // Entities, attributes, relations,
    corrections

Full loop example

The following pattern shows a minimal chat loop using session.chat():

import os
from surrealdb import Spectron

memory = Spectron(context="acme-prod",
    api_key=os.environ["SPECTRON_API_KEY"])
session = await memory.sessions.create(scope=["org/acme/user/alice"])

while True:
    user_input = input("You: ")
    if user_input.lower() in {"exit", "quit"}:
        break

    result = await session.chat(user_input)
    print(f"Agent: {result.reply}")

    if result.memory_updates.corrections:
        for c in result.memory_updates.corrections:
            print(f"  [memory corrected] {c.previous.value!r} → {c.current.value!r}")

await session.close()

import * as readline from "node:readline";
import { Spectron } from "@surrealdb/spectron";

const memory = new Spectron({ context: "acme-prod",
    apiKey: process.env.SPECTRON_API_KEY });
const session = await memory.sessions.create({ scope: ["org/acme/user/alice"] });

const rl = readline.createInterface({ input: process.stdin,
    output: process.stdout });

const ask = () => rl.question("You: ", async (input) => {
    if (input.trim().toLowerCase() === "exit") {
        await session.close();
        rl.close();
        return;
    }

    const result = await session.chat({ message: input });
    console.log(`Agent: ${result.reply}`);

    for (const c of result.memoryUpdates.corrections) {
        console.log(`  [memory corrected] ${c.previous.value} → ${c.current.value}`);
    }

    ask();
});

ask();

Configuring the response model

The model used for response generation is configured at the context level in config.models.response. This is separate from the extraction model (config.models.extraction) used to parse turns into structured facts.

# spectron.config.yaml
models:
  extraction: openai/gpt-4o-mini
  response: openai/gpt-4o

You may use any provider supported by your Spectron deployment. The extraction and response models can differ – a faster, cheaper model is often suitable for extraction while a more capable model handles final response generation.

See Models and providers for the full list of supported providers and how to configure API keys.

When to use `session.turn()` instead

Use session.turn() plus your own model call when you need:

Streaming responses – session.chat() waits for the full completion before returning. If you require token-level streaming to your UI, drive the LLM loop yourself.
Tool use – if your agent needs to call external tools mid-conversation, you need control over the turn-taking loop to interleave tool calls and results.
Custom prompting – if you maintain your own system prompt structure, persona configuration, or prompt template that cannot be expressed through Spectron's context injection format.
Multiple models per turn – if your architecture uses a router, ensemble, or chain of models.
Explicit context control – if you want to inspect or modify the retrieved context string before it reaches the model.

The session.context() method retrieves the formatted context string that Spectron would have injected, so you can replicate the retrieval step in the manual pattern. See Retrieving context for details.

Two integration patterns

What Spectron does in chat

The chat endpoint

reply

citations

memory_updates