Hybrid search | Spectron

Spectron exposes four retrieval modes for document passages (POST .../documents/query) and for the unified read path (POST .../query, which also ranks experiential facts). Each mode trades coverage, precision, and computational cost differently. See Recalling memories for unified recall; this page focuses on mode selection and graph-density signals.

Answer size vs search breadth

k (and limit on /query) controls how many hits are returned after fusion — the answer size. Default 10, maximum 50 (SPECTRON_MAX_QUERY_K, clamp-down only).

The candidate pool that vector, BM25, and graph signals search over is sized separately (SPECTRON_RETRIEVAL_POOL_SIZE, default 256). Raising k returns more fused results but does not widen the internal search. Tier escalation (when confidence is thin) doubles the pool rather than re-running an identical pass.

Structured memories, not hundreds of raw chunks, should answer most queries — see Coherence, retrieval, and cost tiers.

Section expansion

After ranking, Spectron can follow chunk.section_ref (and related section pointers) and append same-section siblings into a separate contextHits channel on /query. Those passages are not counted against k and do not displace ranked hits. /chat and /reflect synthesise over both lists; citations can resolve either. Expansion is on by default — set SPECTRON_RETRIEVAL_SECTION_EXPANSION=0 to disable. See Recalling memories.

Query modes

`vector`

Pure HNSW (hierarchical navigable small world) approximate nearest-neighbour search over dense embeddings. The query is embedded with the same model used at ingestion, and the top-k nearest chunk embeddings are returned.

Vector search excels at paraphrase and semantic similarity – finding chunks that express the same idea in different words. It is weak on exact strings, product codes, proper names, and rare terms that are poorly represented in the embedding model's training data.

The vector leg also searches transcript segments (audio_chunk rows) in the same 1536-dim embedding space as text chunks, so spoken content from audio and video documents can surface as passage hits with time-coded provenance — not only the parent chunk spine.

`bm25`

BM25 full-text search over the chunk corpus. The query is tokenized and matched against the inverted index. Results are ranked by term frequency–inverse document frequency weighted by document length.

BM25 excels at exact terms, product identifiers, model numbers, and specific technical phrases. It is weak on synonyms and paraphrase – if the query uses a term not present in the chunk, BM25 will not find it.

`hybrid`

Reciprocal rank fusion (RRF) of vector and BM25 results. Both retrieval passes run independently, and their ranked lists are merged into a single ranking using the RRF formula:

score(d) = Σ 1 / (k + rank_i(d))

where k is a smoothing constant (default 60) and rank_i(d) is the rank of document d in retrieval pass i. Documents appearing in both lists receive a combined score; documents appearing in only one list are still represented with a lower combined score.

Hybrid mode is the default. It reliably outperforms either mode in isolation across a wide range of query types and is appropriate for most production deployments.

`hybrid_graph`

Hybrid retrieval plus a graph-density reranking pass. After the initial hybrid retrieval, each candidate chunk is rescored based on its connectivity in the knowledge graph. Chunks that are more densely connected to other relevant content – via keyword co-occurrence, document-level links, typed-knowledge edges, or semantic section similarity – receive a higher rerank score.

The graph-density reranker draws on:

Keyword graph: chunks linked to high-scoring keywords that match query terms
Typed-knowledge graph: chunks adjacent to knowledge nodes relevant to the query
Section vectors: semantic similarity between query and document section headings
Document links: outbound links from the retrieved document to related documents
Document summaries: summary-level similarity as a signal for topic alignment
Personalised PageRank: random-walk authority scores from query-matched seed nodes

hybrid_graph produces the highest-quality results for complex queries requiring multi-hop reasoning, but adds latency proportional to graph depth. For simple factual lookups, hybrid is sufficient.

When tuning graph-density reranking, optional graph_edges selects which structural signals contribute. Each value must be one of the recognised edge kinds — typos or unknown values return 400 Bad Request rather than being silently ignored:

Edge kind	Signal
`knowledge_has_keyword`	Chunks linked to query-matching keywords
`section_match`	Section-heading similarity
`document_link`	Cross-document link density
`document_summary`	Document-level summary similarity

Responses may report hybrid_graph on individual hits when several signals combine; that value describes merged evidence in the result, not an input filter.

Basic query

from surrealdb import Spectron

memory = Spectron(context="acme-prod", api_key=os.environ["SPECTRON_API_KEY"])

hits = await memory.knowledge.query(
    query="what is the return window for unopened items?",
    mode="hybrid",
    k=10,
)

for hit in hits:
    print(hit.score)
    print(hit.chunk.text)
    print(hit.document.title)

import { Spectron } from "@surrealdb/spectron";

const memory = new Spectron({ context: "acme-prod", apiKey: process.env.SPECTRON_API_KEY });

const hits = await memory.knowledge.query({
    query: "what is the return window for unopened items?",
    mode: "hybrid",
    k: 10,
});

for (const hit of hits) {
    console.log(hit.score, hit.chunk.text, hit.document.title);
}

Full query with all options

hits = await memory.knowledge.query(
    query="what is the return window for unopened items?",
    mode="hybrid_graph",
    k=10,
    threshold=0.5,          # minimum score to include in results
    vector_weight=0.5,      # relative weight of vector vs BM25 in RRF
    rrf_k=60,               # RRF smoothing constant
    graph_alpha=0.3,        # weight of graph-density rerank vs base score
    expand_graph=True,      # include one-hop keyword and typed-knowledge context
    use_hyde=False,         # Hypothetical Document Embeddings query expansion
    decompose_query=False,  # sub-question decomposition
    use_reranker=False,     # cross-encoder reranking
    filter={"mime_type": ["application/pdf"]},
    scope=["org/acme"],
)

Query result structure

Each hit in the result contains:

{
  "chunk": {
    "id": "chunk:01hy2…",
    "text": "Unopened items may be returned within 30 days of the original purchase date.",
    "section": "Eligibility",
    "position": 7
  },
  "score": 0.87,
  "document": {
    "id": "doc:01hx9…",
    "title": "Returns Policy",
    "source": "returns.pdf"
  }
}

chunk.position is the chunk's ordinal position within the document, useful for retrieving surrounding context.
chunk.section is the heading of the section containing this chunk, or null for documents without section structure.
score is the normalised relevance score after all reranking passes.
document provides provenance for display, citation, or follow-up retrieval.

Advanced options

These flags are honoured on POST /documents/query and the MCP recall path when a request-path LLM provider is configured. On failure or when no LLM is attached, Spectron falls back to single-query retrieval without error.

HyDE (Hypothetical Document Embeddings)

When use_hyde=True, Spectron generates a hypothetical answer to the query using the configured response model, then embeds that hypothetical answer rather than the raw query string. This improves recall for queries phrased as questions rather than document-like statements, at the cost of one LLM call per query.

hits = await memory.knowledge.query(
    query="what is the return window for unopened items?",
    mode="hybrid",
    use_hyde=True,
)

Sub-question decomposition

When decompose_query=True, Spectron splits complex queries into a set of simpler sub-questions, executes each independently, and merges the results. This is useful when a single query implicitly asks multiple things:

hits = await memory.knowledge.query(
    query="what are the return and warranty policies for AirPods Pro 2?",
    mode="hybrid",
    decompose_query=True,
)

Decomposition adds latency proportional to the number of sub-questions and consumes additional LLM tokens. Use it for explicit multi-topic queries rather than as a default.

Cross-encoder reranking

When use_reranker=True, the top-k results from the initial retrieval pass are reranked using a cross-encoder model that jointly encodes the query and each candidate chunk. Cross-encoder reranking is more accurate than bi-encoder (embedding) similarity but significantly slower.

Requires server configuration: set SPECTRON_RERANKER_URL and SPECTRON_RERANKER_MODEL at startup (see Configuration). Without a reranker provider, use_reranker=true falls through to bi-encoder ordering.

hits = await memory.knowledge.query(
    query="return policy for international purchases",
    mode="hybrid",
    use_reranker=True,
    k=20,   # retrieve more candidates for the reranker to rescore
)

Metadata filtering

Apply hard filters before retrieval to restrict results to a subset of the document corpus:

hits = await memory.knowledge.query(
    query="Q3 revenue",
    mode="hybrid",
    filter={
        "mime_type": ["application/pdf"],
        "scope": ["org/acme"],
    },
)

Filters are applied before scoring, so they do not affect the ranking of results that pass through.

Choosing a mode

Query pattern	Recommended mode
General natural-language questions	`hybrid`
Product codes, model numbers, exact phrases	`bm25`
Paraphrase, synonym-heavy queries	`vector`
Complex multi-hop reasoning, graph context	`hybrid_graph`
High-recall requirements (academic, legal)	`hybrid` + `use_reranker=True`