Reasoning

Extraction pipeline

How Spectron classifies and extracts structured memory from conversation turns.

Every conversation turn Spectron receives passes through an extraction pipeline that turns raw text into structured memory: entities, attributes, relations, instructions, and uncertainties. The pipeline balances latency and accuracy — lightweight heuristics run first; language models are invoked only when the turn needs deeper interpretation.

  1. Heuristics — Pattern matching for known entities, temporal phrases (“since January”, “until next quarter”), and instruction-like language (“always”, “never”, “from now on”). Simple corrections to known facts can finish here without calling a model.

  2. Fast model — Handles most turns: new entities, preferences, straightforward assertions, and routine corrections.

  3. Stronger model — Reserved for harder cases: contradictions within one turn, ambiguous references, or output that fails structural validation.

You do not choose a stage. Spectron escalates automatically when the current stage cannot produce a confident result.

The pipeline returns a structured diff nested under extraction on POST /facts, or under each entry in extractions on POST /facts/batch:

{
"entities": [
{ "name": "Alice Chen", "type": "Person", "memory_category": "identity" }
],
"attributes": [
{ "entity": "Alice Chen", "key": "role", "value": "CTO", "memory_category": "identity" }
],
"relations": [
{ "subject": "Alice Chen", "verb": "works_at", "object": "Acme", "memory_category": "identity" }
],
"instructions": [
{ "label": "Bullet-point responses", "description": "Always respond using bullet points" }
],
"uncertainties": [
{ "about": "job title", "reason": "User joked about being CEO, then appeared to correct themselves" }
]
}

Each extracted entity, attribute, and relation carries a memory_category — one of identity, knowledge, or context. Labels the model emits outside that set (for example emotion, group) are bucketed as context during extraction rather than failing the turn. Episodic transcript material stays on sessions and turns; instructions and uncertainties are stored separately. See Memory categories.

Nothing is written blindly: extractions pass through reconciliation before they become durable memory.

Chat synthesis is separate. /chat and Playground replies are generated after extraction and recall. The response model may add narrative colour, continue a well-known story, or answer from general knowledge — only the structured extraction diff (and reconciled graph rows) are durable memory attributed to your turns. If you paste prose from a published book, extracted entities and relations reflect what you sent; any additional storytelling in the chat reply is not automatically stored unless the extractor captures it as fact.

Resilience: if extraction fails for a single turn (model error, parse failure), Spectron still stores the conversational chunk and returns a chunk-only diff — the session is not aborted. Token budget exhaustion (429) still propagates.

For interactive agents, extraction on a turn completes before the API returns, so the next /query, /state, or /profile call reflects what was just said.

curl -sS "$SPECTRON_URL/api/v1/$SPECTRON_CONTEXT_ID/facts" \
-H "Authorization: Bearer $SPECTRON_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "I'\''m Alice, CTO at Acme. Always respond in bullet points.",
"infer": "full",
"scope": ["org/acme/user/alice"]
}'

The response includes nested extraction plus sessionId and turnId.

If structured extraction cannot be validated, Spectron still retains the turn text. The content remains searchable and can be reprocessed; you are not left with a silent failure. Open uncertainty records flag cases where the pipeline could not commit to a single interpretation — see Instructions and uncertainties.

Document titles, section paths, retrieved hit text, and LLM-emitted identifiers are sanitised before they are interpolated into extraction, chat, elaboration, or consolidation prompts — control characters and newline-based framing attacks are collapsed or truncated. This complements the ingest-time injection scanner (which inspects chunk body text). Sub-threshold injection findings are stored as uncertainty rows scoped to the same visibility as the write they describe.

Was this page helpful?