Architecture

Coherence, retrieval, and cost tiers

Five coherence dimensions, hybrid structural retrieval, and the four-tier query ladder.

When you hear cat, you do not run one search string – you blend what it reminds you of (pets, lions, a team logo), exact words you once read, how things connect, and whether a fact is still true. Retrieval in Spectron is built the same way: several signals fused on purpose, not a single embedding score pretending to be understanding.

Memory is coherent along five axes at once – Spectron stores enough metadata to answer questions on each, so retrieval stays auditable and trustworthy:

DimensionWhat it gives you
SemanticSimilarity before structure is explicit: embedding-based recall over entities and passages.
LexicalWhat was actually said or shown, down to character positions in the source: extracted attributes carry source.span into the originating turn or document passage. Citations are a stored field, not best-effort prose.
RelationalUnderstanding as connections: one entity/relation graph so “cat” can reach a manual, a prior turn, and a related entity (lion, pet, breed) without treating them as unrelated chunks.
TimeWhat held when, and how beliefs evolved: valid_from / valid_until, as_of, and time-travel queries. See Tri-temporal model.
SpaceWhere a fact was captured or applies – optional geometry; geo filters compose with semantic and graph signals in the same ranker.

Vector-only approaches tend to miss several of these at once; unstructured-only stores miss them unless you add structure. Spectron stores the metadata up front.

Retrieval is hybrid by design. Embeddings are one signal; they are fused with other precomputed structure so top‑k is not a black box.

Typical signals in the fused ranker include:

  • Vector recall – dense embeddings on entities, attributes, chunks, and (when enabled) images and audio.

  • Lexical recall – BM25 over chunk text and entity names for exact phrases and rare terms.

  • Graph traversal – limited hops from seed entities when surface forms differ.

  • Keyword bridges – keyword nodes linking distant passages.

  • Section embeddings and document links – related sections, not only the single nearest chunk.

  • Personalised PageRank – graph-walk scoring biased toward query seeds.

  • Geographic recall – radius, polygon, nearest‑k on stored geometry.

  • Trace-derived features – prior retrieval outcomes boost what worked; demote what led to corrections.

Each /query emits a retrieval_trace recording candidates, per-signal contributions, and the returned set.

Hands-on retrieval modes are in Hybrid search.

Spectron does not run the same expensive path on every request. Reads route through a four-tier ladder so simple questions consume as few LLM tokens as possible – structured lookup and cache hits avoid building huge prompts or calling the synthesis model when a cheaper path suffices.

TierWhat happensToken / cost profile
1 – Direct structured lookupTyped questions resolved from the entity/attribute graph by key – no embeddings, no LLM, no ranking pass.Minimal tokens – often nothing sent to an LLM.
2 – Response reuseMatch against prior answers in the same Context and scope, with entity-aware invalidation (cited facts must still be current). Returns a prior answer when still valid.No new generation on a hit – reuses prior synthesis.
3 – Hybrid retrieval and synthesisFused retrieval, then LLM synthesis over a bounded context block.Moderate tokens – default for open questions.
4 – Full-context fallbackBroader sweep when tier 3 is thin: more candidates, deeper graph hops, optional query rewrite, larger context.Highest token use – explicit escalation, still traceable.

Tiers cascade (miss on 2 falls to 3; thin 3 escalates to 4). Each tier writes retrieval_trace metadata describing which tier ran and why – so you can see where token spend goes and tune per Context.

In short: most “what is Alice’s role?”-style questions should resolve without stuffing the entire memory graph into the model context.

Was this page helpful?