Coherence, retrieval, and cost tiers

Five coherence dimensions, hybrid structural retrieval, and the four-tier query ladder.

When you hear cat, you do not run one search string – you blend what it reminds you of (pets, lions, a team logo), exact words you once read, how things connect, and whether a fact is still true. Retrieval in Spectron is built the same way: several signals fused on purpose, not a single embedding score pretending to be understanding.

Five coherence dimensions

Memory is coherent along five axes at once – Spectron stores enough metadata to answer questions on each, so retrieval stays auditable and trustworthy:

Dimension	What it gives you
Semantic	Similarity before structure is explicit: embedding-based recall over entities and passages.
Lexical	What was actually said or shown, down to character positions in the source: extracted attributes carry `source.span` into the originating turn or document passage. Citations are a stored field, not best-effort prose.
Relational	Understanding as connections: one entity/relation graph so “cat” can reach a manual, a prior turn, and a related entity (lion, pet, breed) without treating them as unrelated chunks.
Time	What held when, and how beliefs evolved: `valid_from` / `valid_until`, `as_of`, and time-travel queries. See Tri-temporal model.
Space	Where a fact was captured or applies – optional geometry; geo filters compose with semantic and graph signals in the same ranker.

Vector-only approaches tend to miss several of these at once; unstructured-only stores miss them unless you add structure. Spectron stores the metadata up front.

Structural retrieval (beyond embeddings)

Retrieval is hybrid by design. Embeddings are one signal; they are fused with other precomputed structure so top‑k is not a black box.

Typical signals in the fused ranker include:

Vector recall – dense embeddings on entities, attributes, chunks, and (when enabled) images and audio.
Lexical recall – BM25 over chunk text and entity names for exact phrases and rare terms.
Graph traversal – limited hops from seed entities when surface forms differ.
Keyword bridges – RAKE keyphrases linked via knowledge_has_keyword edges from query-matched terms to document passages.
Section embeddings and document links – related sections, not only the single nearest chunk.
Personalised PageRank – graph-walk scoring biased toward query seeds. Relation edges used in graph hops are scope-gated on the edge itself, not only on destination entities.
Geographic recall – radius, polygon, nearest‑k on stored geometry.
Trace-derived features – prior retrieval outcomes boost what worked; demote what led to corrections.

Each /query emits a retrieval_trace recording candidates, per-signal contributions, and the returned set.

Hands-on retrieval modes are in Hybrid search.

Tiered query resolution

Spectron does not run the same expensive path on every request. Reads route through a four-tier ladder so simple questions consume as few LLM tokens as possible – structured lookup and cache hits avoid building huge prompts or calling the synthesis model when a cheaper path suffices.

Tier	What happens	Token / cost profile
1 – Direct structured lookup	Typed questions resolved from the entity/attribute graph by key – no embeddings, no LLM, no ranking pass.	Minimal tokens – often nothing sent to an LLM.
2 – Response reuse	Match against prior answers in the same Context and scope, with entity-aware invalidation (cited facts must still be current). Returns a prior answer when still valid. Bypassed for `/chat` when the session already has prior turns (the reply depends on the transcript window; windowed turns neither reuse nor seed the cache).	No new generation on a hit – reuses prior synthesis.
3 – Hybrid retrieval and synthesis	Fused retrieval over a bounded internal pool (default 256 candidates; independent of answer `k`), then LLM synthesis over a bounded context block. Default answer size `k` / `limit` is 10 (max 50).	Moderate tokens – default for open questions.
4 – Full-context fallback	Broader sweep when tier 3 is thin or below the confidence floor (0.40 fused score): more candidates, deeper graph hops, optional query rewrite, larger context. Duplicate hits from tier 3 and tier 4 are merged by id, keeping the highest score.	Highest token use – explicit escalation, still traceable.

Tiers cascade (miss on 2 falls to 3; thin or low-confidence 3 escalates to 4). Each tier writes retrieval_trace metadata describing which tier ran and why – so you can see where token spend goes and tune per Context.

In short: most “what is Alice’s role?”-style questions should resolve without stuffing the entire memory graph into the model context.