Retrieval

Hybrid search

Using vector, BM25, and graph-density retrieval modes in Spectron.

Spectron exposes four retrieval modes for querying the knowledge layer. Each mode trades coverage, precision, and computational cost differently. Choosing the right mode for your query pattern is the most direct lever for improving retrieval quality.

Pure HNSW (hierarchical navigable small world) approximate nearest-neighbour search over dense embeddings. The query is embedded with the same model used at ingestion, and the top-k nearest chunk embeddings are returned.

Vector search excels at paraphrase and semantic similarity – finding chunks that express the same idea in different words. It is weak on exact strings, product codes, proper names, and rare terms that are poorly represented in the embedding model's training data.

BM25 full-text search over the chunk corpus. The query is tokenised and matched against the inverted index. Results are ranked by term frequency–inverse document frequency weighted by document length.

BM25 excels at exact terms, product identifiers, model numbers, and specific technical phrases. It is weak on synonyms and paraphrase – if the query uses a term not present in the chunk, BM25 will not find it.

Reciprocal rank fusion (RRF) of vector and BM25 results. Both retrieval passes run independently, and their ranked lists are merged into a single ranking using the RRF formula:

score(d) = Σ 1 / (k + rank_i(d))

where k is a smoothing constant (default 60) and rank_i(d) is the rank of document d in retrieval pass i. Documents appearing in both lists receive a combined score; documents appearing in only one list are still represented with a lower combined score.

Hybrid mode is the default. It reliably outperforms either mode in isolation across a wide range of query types and is appropriate for most production deployments.

Hybrid retrieval plus a graph-density reranking pass. After the initial hybrid retrieval, each candidate chunk is rescored based on its connectivity in the knowledge graph. Chunks that are more densely connected to other relevant content – via keyword co-occurrence, document-level links, typed-knowledge edges, or semantic section similarity – receive a higher rerank score.

The graph-density reranker draws on:

  • Keyword graph: chunks linked to high-scoring keywords that match query terms

  • Typed-knowledge graph: chunks adjacent to knowledge nodes relevant to the query

  • Section vectors: semantic similarity between query and document section headings

  • Document links: outbound links from the retrieved document to related documents

  • Document summaries: summary-level similarity as a signal for topic alignment

  • Keyword co-occurrence: keywords that frequently appear together with the matched terms

  • Personalised PageRank: random-walk authority scores from query-matched seed nodes

hybrid_graph produces the highest-quality results for complex queries requiring multi-hop reasoning, but adds latency proportional to graph depth. For simple factual lookups, hybrid is sufficient.

When tuning graph-density reranking, optional graph_edges selects which structural signals contribute. Each value must be one of the recognised edge kinds — typos or unknown values return 400 Bad Request rather than being silently ignored:

Edge kindSignal
knowledge_has_keywordChunks linked to query-matching keywords
section_matchSection-heading similarity
document_linkCross-document link density
document_summaryDocument-level summary similarity
keyword_cooccurrenceKeyword PMI co-occurrence

Responses may report hybrid_graph on individual hits when several signals combine; that value describes merged evidence in the result, not an input filter.

from spectron import Spectron

memory = Spectron(context="acme-prod", api_key=os.environ["SPECTRON_API_KEY"])

hits = await memory.knowledge.query(
query="what is the return window for unopened items?",
mode="hybrid",
k=10,
)

for hit in hits:
print(hit.score)
print(hit.chunk.text)
print(hit.document.title)
import { Spectron } from "spectron";

const memory = new Spectron({ context: "acme-prod", apiKey: process.env.SPECTRON_API_KEY });

const hits = await memory.knowledge.query({
query: "what is the return window for unopened items?",
mode: "hybrid",
k: 10,
});

for (const hit of hits) {
console.log(hit.score, hit.chunk.text, hit.document.title);
}
hits = await memory.knowledge.query(
query="what is the return window for unopened items?",
mode="hybrid_graph",
k=10,
threshold=0.5, # minimum score to include in results
vector_weight=0.5, # relative weight of vector vs BM25 in RRF
rrf_k=60, # RRF smoothing constant
graph_alpha=0.3, # weight of graph-density rerank vs base score
expand_graph=True, # include one-hop keyword and typed-knowledge context
use_hyde=False, # Hypothetical Document Embeddings query expansion
decompose_query=False, # sub-question decomposition
use_reranker=False, # cross-encoder reranking
filter={"mime_type": ["application/pdf"]},
scope={"org": "acme"},
)

Each hit in the result contains:

{
"chunk": {
"id": "chunk:01hy2…",
"text": "Unopened items may be returned within 30 days of the original purchase date.",
"section": "Eligibility",
"position": 7
},
"score": 0.87,
"document": {
"id": "doc:01hx9…",
"title": "Returns Policy",
"source": "returns.pdf"
}
}
  • chunk.position is the chunk's ordinal position within the document, useful for retrieving surrounding context.

  • chunk.section is the heading of the section containing this chunk, or null for documents without section structure.

  • score is the normalised relevance score after all reranking passes.

  • document provides provenance for display, citation, or follow-up retrieval.

When use_hyde=True, Spectron generates a hypothetical answer to the query using the configured response model, then embeds that hypothetical answer rather than the raw query string. This improves recall for queries phrased as questions rather than document-like statements, at the cost of one LLM call per query.

hits = await memory.knowledge.query(
query="what is the return window for unopened items?",
mode="hybrid",
use_hyde=True,
)

When decompose_query=True, Spectron splits complex queries into a set of simpler sub-questions, executes each independently, and merges the results. This is useful when a single query implicitly asks multiple things:

hits = await memory.knowledge.query(
query="what are the return and warranty policies for AirPods Pro 2?",
mode="hybrid",
decompose_query=True,
)

Decomposition adds latency proportional to the number of sub-questions and consumes additional LLM tokens. Use it for explicit multi-topic queries rather than as a default.

When use_reranker=True, the top-k results from the initial retrieval pass are reranked using a cross-encoder model that jointly encodes the query and each candidate chunk. Cross-encoder reranking is more accurate than bi-encoder (embedding) similarity but significantly slower.

hits = await memory.knowledge.query(
query="return policy for international purchases",
mode="hybrid",
use_reranker=True,
k=20, # retrieve more candidates for the reranker to rescore
)

Apply hard filters before retrieval to restrict results to a subset of the document corpus:

hits = await memory.knowledge.query(
query="Q3 revenue",
mode="hybrid",
filter={
"mime_type": ["application/pdf"],
"scope": {"org": "acme"},
},
)

Filters are applied before scoring, so they do not affect the ranking of results that pass through.

Query patternRecommended mode
General natural-language questionshybrid
Product codes, model numbers, exact phrasesbm25
Paraphrase, synonym-heavy queriesvector
Complex multi-hop reasoning, graph contexthybrid_graph
High-recall requirements (academic, legal)hybrid + use_reranker=True

Was this page helpful?