Retrieve

Hybrid search

Using vector, BM25, and graph-density retrieval modes in Spectron.

Spectron exposes four retrieval modes for document passages (POST .../documents/query) and for the unified read path (POST .../query, which also ranks experiential facts). Each mode trades coverage, precision, and computational cost differently. See Recalling memories for unified recall; this page focuses on mode selection and graph-density signals.

k (and limit on /query) controls how many hits are returned after fusion — the answer size. Default 10, maximum 50 (SPECTRON_MAX_QUERY_K, clamp-down only).

The candidate pool that vector, BM25, and graph signals search over is sized separately (SPECTRON_RETRIEVAL_POOL_SIZE, default 256). Raising k returns more fused results but does not widen the internal search. Tier escalation (when confidence is thin) doubles the pool rather than re-running an identical pass.

Structured memories, not hundreds of raw chunks, should answer most queries — see Coherence, retrieval, and cost tiers.

Pure HNSW (hierarchical navigable small world) approximate nearest-neighbour search over dense embeddings. The query is embedded with the same model used at ingestion, and the top-k nearest chunk embeddings are returned.

Vector search excels at paraphrase and semantic similarity – finding chunks that express the same idea in different words. It is weak on exact strings, product codes, proper names, and rare terms that are poorly represented in the embedding model's training data.

The vector leg also searches transcript segments (audio_chunk rows) in the same 1536-dim embedding space as text chunks, so spoken content from audio and video documents can surface as passage hits with time-coded provenance — not only the parent chunk spine.

BM25 full-text search over the chunk corpus. The query is tokenised and matched against the inverted index. Results are ranked by term frequency–inverse document frequency weighted by document length.

BM25 excels at exact terms, product identifiers, model numbers, and specific technical phrases. It is weak on synonyms and paraphrase – if the query uses a term not present in the chunk, BM25 will not find it.

Reciprocal rank fusion (RRF) of vector and BM25 results. Both retrieval passes run independently, and their ranked lists are merged into a single ranking using the RRF formula:

score(d) = Σ 1 / (k + rank_i(d))

where k is a smoothing constant (default 60) and rank_i(d) is the rank of document d in retrieval pass i. Documents appearing in both lists receive a combined score; documents appearing in only one list are still represented with a lower combined score.

Hybrid mode is the default. It reliably outperforms either mode in isolation across a wide range of query types and is appropriate for most production deployments.

Hybrid retrieval plus a graph-density reranking pass. After the initial hybrid retrieval, each candidate chunk is rescored based on its connectivity in the knowledge graph. Chunks that are more densely connected to other relevant content – via keyword co-occurrence, document-level links, typed-knowledge edges, or semantic section similarity – receive a higher rerank score.

The graph-density reranker draws on:

  • Keyword graph: chunks linked to high-scoring keywords that match query terms

  • Typed-knowledge graph: chunks adjacent to knowledge nodes relevant to the query

  • Section vectors: semantic similarity between query and document section headings

  • Document links: outbound links from the retrieved document to related documents

  • Document summaries: summary-level similarity as a signal for topic alignment

  • Personalised PageRank: random-walk authority scores from query-matched seed nodes

hybrid_graph produces the highest-quality results for complex queries requiring multi-hop reasoning, but adds latency proportional to graph depth. For simple factual lookups, hybrid is sufficient.

When tuning graph-density reranking, optional graph_edges selects which structural signals contribute. Each value must be one of the recognised edge kinds — typos or unknown values return 400 Bad Request rather than being silently ignored:

Edge kindSignal
knowledge_has_keywordChunks linked to query-matching keywords
section_matchSection-heading similarity
document_linkCross-document link density
document_summaryDocument-level summary similarity

Responses may report hybrid_graph on individual hits when several signals combine; that value describes merged evidence in the result, not an input filter.

from surrealdb import Spectron

memory = Spectron(context="acme-prod", api_key=os.environ["SPECTRON_API_KEY"])

hits = await memory.knowledge.query(
query="what is the return window for unopened items?",
mode="hybrid",
k=10,
)

for hit in hits:
print(hit.score)
print(hit.chunk.text)
print(hit.document.title)
import { Spectron } from "@surrealdb/spectron";

const memory = new Spectron({ context: "acme-prod", apiKey: process.env.SPECTRON_API_KEY });

const hits = await memory.knowledge.query({
query: "what is the return window for unopened items?",
mode: "hybrid",
k: 10,
});

for (const hit of hits) {
console.log(hit.score, hit.chunk.text, hit.document.title);
}
hits = await memory.knowledge.query(
query="what is the return window for unopened items?",
mode="hybrid_graph",
k=10,
threshold=0.5, # minimum score to include in results
vector_weight=0.5, # relative weight of vector vs BM25 in RRF
rrf_k=60, # RRF smoothing constant
graph_alpha=0.3, # weight of graph-density rerank vs base score
expand_graph=True, # include one-hop keyword and typed-knowledge context
use_hyde=False, # Hypothetical Document Embeddings query expansion
decompose_query=False, # sub-question decomposition
use_reranker=False, # cross-encoder reranking
filter={"mime_type": ["application/pdf"]},
scope=["org/acme"],
)

Each hit in the result contains:

{
"chunk": {
"id": "chunk:01hy2…",
"text": "Unopened items may be returned within 30 days of the original purchase date.",
"section": "Eligibility",
"position": 7
},
"score": 0.87,
"document": {
"id": "doc:01hx9…",
"title": "Returns Policy",
"source": "returns.pdf"
}
}
  • chunk.position is the chunk's ordinal position within the document, useful for retrieving surrounding context.

  • chunk.section is the heading of the section containing this chunk, or null for documents without section structure.

  • score is the normalised relevance score after all reranking passes.

  • document provides provenance for display, citation, or follow-up retrieval.

These flags are honoured on POST /documents/query and the MCP recall path when a request-path LLM provider is configured. On failure or when no LLM is attached, Spectron falls back to single-query retrieval without error.

When use_hyde=True, Spectron generates a hypothetical answer to the query using the configured response model, then embeds that hypothetical answer rather than the raw query string. This improves recall for queries phrased as questions rather than document-like statements, at the cost of one LLM call per query.

hits = await memory.knowledge.query(
query="what is the return window for unopened items?",
mode="hybrid",
use_hyde=True,
)

When decompose_query=True, Spectron splits complex queries into a set of simpler sub-questions, executes each independently, and merges the results. This is useful when a single query implicitly asks multiple things:

hits = await memory.knowledge.query(
query="what are the return and warranty policies for AirPods Pro 2?",
mode="hybrid",
decompose_query=True,
)

Decomposition adds latency proportional to the number of sub-questions and consumes additional LLM tokens. Use it for explicit multi-topic queries rather than as a default.

When use_reranker=True, the top-k results from the initial retrieval pass are reranked using a cross-encoder model that jointly encodes the query and each candidate chunk. Cross-encoder reranking is more accurate than bi-encoder (embedding) similarity but significantly slower.

Requires server configuration: set SPECTRON_RERANKER_URL and SPECTRON_RERANKER_MODEL at startup (see Configuration). Without a reranker provider, use_reranker=true falls through to bi-encoder ordering.

hits = await memory.knowledge.query(
query="return policy for international purchases",
mode="hybrid",
use_reranker=True,
k=20, # retrieve more candidates for the reranker to rescore
)

Apply hard filters before retrieval to restrict results to a subset of the document corpus:

hits = await memory.knowledge.query(
query="Q3 revenue",
mode="hybrid",
filter={
"mime_type": ["application/pdf"],
"scope": ["org/acme"],
},
)

Filters are applied before scoring, so they do not affect the ranking of results that pass through.

Query patternRecommended mode
General natural-language questionshybrid
Product codes, model numbers, exact phrasesbm25
Paraphrase, synonym-heavy queriesvector
Complex multi-hop reasoning, graph contexthybrid_graph
High-recall requirements (academic, legal)hybrid + use_reranker=True

Was this page helpful?