Spectron exposes four retrieval modes for querying the knowledge layer. Each mode trades coverage, precision, and computational cost differently. Choosing the right mode for your query pattern is the most direct lever for improving retrieval quality.
Query modes
vector
Pure HNSW (hierarchical navigable small world) approximate nearest-neighbour search over dense embeddings. The query is embedded with the same model used at ingestion, and the top-k nearest chunk embeddings are returned.
Vector search excels at paraphrase and semantic similarity – finding chunks that express the same idea in different words. It is weak on exact strings, product codes, proper names, and rare terms that are poorly represented in the embedding model's training data.
bm25
BM25 full-text search over the chunk corpus. The query is tokenised and matched against the inverted index. Results are ranked by term frequency–inverse document frequency weighted by document length.
BM25 excels at exact terms, product identifiers, model numbers, and specific technical phrases. It is weak on synonyms and paraphrase – if the query uses a term not present in the chunk, BM25 will not find it.
hybrid
Reciprocal rank fusion (RRF) of vector and BM25 results. Both retrieval passes run independently, and their ranked lists are merged into a single ranking using the RRF formula:
where k is a smoothing constant (default 60) and rank_i(d) is the rank of document d in retrieval pass i. Documents appearing in both lists receive a combined score; documents appearing in only one list are still represented with a lower combined score.
Hybrid mode is the default. It reliably outperforms either mode in isolation across a wide range of query types and is appropriate for most production deployments.
hybrid_graph
Hybrid retrieval plus a graph-density reranking pass. After the initial hybrid retrieval, each candidate chunk is rescored based on its connectivity in the knowledge graph. Chunks that are more densely connected to other relevant content – via keyword co-occurrence, document-level links, typed-knowledge edges, or semantic section similarity – receive a higher rerank score.
The graph-density reranker draws on:
Keyword graph: chunks linked to high-scoring keywords that match query terms
Typed-knowledge graph: chunks adjacent to knowledge nodes relevant to the query
Section vectors: semantic similarity between query and document section headings
Document links: outbound links from the retrieved document to related documents
Document summaries: summary-level similarity as a signal for topic alignment
Keyword co-occurrence: keywords that frequently appear together with the matched terms
Personalised PageRank: random-walk authority scores from query-matched seed nodes
hybrid_graph produces the highest-quality results for complex queries requiring multi-hop reasoning, but adds latency proportional to graph depth. For simple factual lookups, hybrid is sufficient.
When tuning graph-density reranking, optional graph_edges selects which structural signals contribute. Each value must be one of the recognised edge kinds — typos or unknown values return 400 Bad Request rather than being silently ignored:
| Edge kind | Signal |
|---|---|
knowledge_has_keyword | Chunks linked to query-matching keywords |
section_match | Section-heading similarity |
document_link | Cross-document link density |
document_summary | Document-level summary similarity |
keyword_cooccurrence | Keyword PMI co-occurrence |
Responses may report hybrid_graph on individual hits when several signals combine; that value describes merged evidence in the result, not an input filter.
Basic query
Full query with all options
Query result structure
Each hit in the result contains:
chunk.positionis the chunk's ordinal position within the document, useful for retrieving surrounding context.chunk.sectionis the heading of the section containing this chunk, ornullfor documents without section structure.scoreis the normalised relevance score after all reranking passes.documentprovides provenance for display, citation, or follow-up retrieval.
Advanced options
HyDE (Hypothetical Document Embeddings)
When use_hyde=True, Spectron generates a hypothetical answer to the query using the configured response model, then embeds that hypothetical answer rather than the raw query string. This improves recall for queries phrased as questions rather than document-like statements, at the cost of one LLM call per query.
Sub-question decomposition
When decompose_query=True, Spectron splits complex queries into a set of simpler sub-questions, executes each independently, and merges the results. This is useful when a single query implicitly asks multiple things:
Decomposition adds latency proportional to the number of sub-questions and consumes additional LLM tokens. Use it for explicit multi-topic queries rather than as a default.
Cross-encoder reranking
When use_reranker=True, the top-k results from the initial retrieval pass are reranked using a cross-encoder model that jointly encodes the query and each candidate chunk. Cross-encoder reranking is more accurate than bi-encoder (embedding) similarity but significantly slower.
Metadata filtering
Apply hard filters before retrieval to restrict results to a subset of the document corpus:
Filters are applied before scoring, so they do not affect the ranking of results that pass through.
Choosing a mode
| Query pattern | Recommended mode |
|---|---|
| General natural-language questions | hybrid |
| Product codes, model numbers, exact phrases | bm25 |
| Paraphrase, synonym-heavy queries | vector |
| Complex multi-hop reasoning, graph context | hybrid_graph |
| High-recall requirements (academic, legal) | hybrid + use_reranker=True |