Extraction pipeline

Every conversation turn Spectron receives passes through an extraction pipeline that turns raw text into structured memory: entities, attributes, relations, instructions, and uncertainties. The pipeline balances latency and accuracy — lightweight heuristics run first; language models are invoked only when the turn needs deeper interpretation.

How stages escalate

Heuristics — Pattern matching for known entities, temporal phrases (“since January”, “until next quarter”), and instruction-like language (“always”, “never”, “from now on”). Simple corrections to known facts can finish here without calling a model.
Fast model — Handles most turns: new entities, preferences, straightforward assertions, and routine corrections.
Stronger model — Reserved for harder cases: contradictions within one turn, ambiguous references, or output that fails structural validation.

You do not choose a stage. Spectron escalates automatically when the current stage cannot produce a confident result.

What extraction produces

The pipeline returns a structured diff (also visible in the POST /facts response and session turn diffs):

{
  "entities": [
    { "name": "Alice Chen", "type": "Person", "memory_category": "identity" }
  ],
  "attributes": [
    { "entity": "Alice Chen", "key": "role", "value": "CTO", "memory_category": "identity" }
  ],
  "relations": [
    { "subject": "Alice Chen", "verb": "works_at", "object": "Acme", "memory_category": "identity" }
  ],
  "instructions": [
    { "label": "Bullet-point responses", "description": "Always respond using bullet points" }
  ],
  "uncertainties": [
    { "about": "job title", "reason": "User joked about being CEO, then appeared to correct themselves" }
  ]
}

Each extracted entity, attribute, and relation carries a memory_category — one of identity, knowledge, or context. Invalid values are rejected with 400 Bad Request. Episodic transcript material stays on sessions and turns; instructions and uncertainties are stored separately. See Memory categories.

Nothing is written blindly: extractions pass through reconciliation before they become durable memory.

When extraction runs

For interactive agents, extraction on a turn completes before the API returns, so the next /query, /state, or /profile call reflects what was just said.

curl -sS "$SPECTRON_URL/api/v1/$SPECTRON_CONTEXT_ID/sessions/$SESSION_ID/turns" \
  -H "API-KEY: $SPECTRON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "role": "user",
    "content": "I'\''m Alice, CTO at Acme. Always respond in bullet points.",
    "scope": ["org=acme", "user=alice"]
  }'

The response includes the extraction diff and a trace_id for audit.

When extraction is incomplete

If structured extraction cannot be validated, Spectron still retains the turn text. The content remains searchable and can be reprocessed; you are not left with a silent failure. Open uncertainty records flag cases where the pipeline could not commit to a single interpretation — see Instructions and uncertainties.

Extraction pipeline

How stages escalate

What extraction produces

When extraction runs

When extraction is incomplete

Related reading