Explore our releases, news, events, and much more
Retrieval-Augmented Generation (RAG) has become the go-to architecture for grounding AI agents in real knowledge. But while most RAG systems treat documents as isolated chunks of text, knowledge graph RAG adds a critical layer: understanding the relationships between concepts, chunks, and documents. This is sometimes also called a context graph (see AI’s-trillion dollar opportunity: Context graphs).
In this post, we’ll walk through two powerful SurrealQL query patterns that demonstrate how to retrieve context from a knowledge graph to feed AI agents. Whether you’re building a chatbot, a research assistant, or an autonomous agent, these patterns show how to move beyond basic vector search to semantic, graph-aware retrieval. SurrealQL is a query language that allows you to combine all models: vector, graph, relational, BM25 and more, in the same query.
A typical knowledge graph RAG system has three main entities:
The magic happens in the relationships. This graph structure lets us ask richer questions than pure vector search allows. Instead of just “which chunks match my query?”, we can ask “which documents contain chunks about concepts similar to my query?” - a subtle but powerful difference.
MENTIONS_CONCEPT
)The first query pattern uses concept similarity as a proxy for document relevance. Here’s the logic:
This is a big query, but don’t worry - it’s broken down and explained just below.
LET $sub_query = SELECT VALUE chunks.map(|$chunk| { index: $chunk.index, concept: id, content: $chunk.content, chunk: $chunk.id, doc: $chunk.doc, score: score }) AS x FROM ( SELECT *, <-MENTIONS_CONCEPT<-chunk.{id, doc, index} AS chunks FROM ( SELECT *, (1 - vector::distance::knn()) AS score OMIT embedding FROM concept WHERE embedding <|5,40|> $embedding ) WHERE score >= $threshold ); SELECT doc, array::group({ chunk: chunk, index: index, max_score: max_score, concepts: concepts }) AS chunks, math::max(max_score) AS best_concept_score FROM ( SELECT doc, chunk, index, content, math::max(score) AS max_score, array::group(concept) AS concepts FROM array::flatten($sub_query) GROUP BY chunk ) GROUP BY doc ORDER BY best_concept_score DESC;
Example response:
[ { best_concept_score: 0.9999999403953552f, chunks: [ { chunk: chunk:92355c0fe4d03cae4c591361b38f85e1, concepts: [ concept:⟨SECRETARY OF STATE⟩ ], index: 11, max_score: 0.9999999403953552f content: ... }, ... ], doc: document:1adeceeb156bcf4de5324feeb0321c0d }, { best_concept_score: 0.9999999403953552f, chunks: [ { chunk: chunk:092dbfb54865adf2e6ff2a5dd68afed8, concepts: [ concept:SECRETARY ], index: 17, max_score: 0.821898481619539f content: ... }, { chunk: chunk:2867a09694a33dbbc1daa769ef048612, concepts: [ concept:MINISTER ], index: 8, max_score: 0.5018606779741432f content: ... }, { chunk: chunk:28cc4ee0cc2fc236c779dff09485a5b5, concepts: [ concept:⟨SECRETARY OF STATE⟩, concept:MINISTER ], index: 13, max_score: 0.9999999403953552f content: ... }, ... ], doc: document:1b22a97499fb98242e5bf5f4e0315508 }, ... ]
SELECT *, (1 - vector::distance::knn()) AS score OMIT embedding FROM concept WHERE embedding <|5,40|> $embedding
This performs a k-nearest neighbours (k-NN) search, finding the top 5
concepts from a pool of 40
candidates that are most similar to the user’s query embedding ($embedding
). The <|5,40|>
syntax is SurrealDB’s vector search operator.
<-MENTIONS_CONCEPT<-chunk.{id, doc, index} AS chunks
Here’s where the graph magic happens. The arrow syntax (<-MENTIONS_CONCEPT<-
) walks the graph backward from concepts to chunks, following the MENTIONS_CONCEPT edge relationship. This is SurrealDB’s graph traversal syntax - no JOINs needed.
The outer query groups results first by chunk (to find the best concept per chunk), then by document (to collect all relevant chunks), and finally ranks documents by their best-scoring concept.
Concept-based retrieval shines when:
The second pattern takes a more direct approach: find chunks by vector similarity, then group them by document and return contextual metadata.
The query:
SELECT best_chunk_score, summary, doc.{id, filename, content_type}, array::transpose([ contents, scores, chunks, chunk_indexes ]).map(|$arr| { content: $arr[0], score: $arr[1], id: $arr[2], chunk_index: $arr[3] }) AS chunks FROM ( SELECT doc, summary, math::max(score) AS best_chunk_score, array::group(content) AS contents, array::group(score) AS scores, array::group(id) AS chunks, array::group(index) AS chunk_indexes FROM ( SELECT *, (1 - vector::distance::knn()) AS score OMIT embedding FROM chunk WHERE embedding <|5,40|> $embedding ORDER BY index ASC ) WHERE score >= $threshold GROUP BY doc ORDER BY best_chunk_score DESC );
Example response:
[ { best_chunk_score: 0.3430139030822458f, chunks: [ { chunk_index: 33, content: "...", id: chunk:dce5670f0f27810100a9f66777398f2c, score: 0.30309090110388426f }, { chunk_index: 34, content: "...", id: chunk:8c6e7d0f19f0cd111416535066d06a7b, score: 0.3430139030822458f }, { chunk_index: 35, content: "...", id: chunk:4bb7da79c7dc8daaf266ac4c0b1173c0, score: 0.31088319745159065f } ], doc: { content_type: 'application/pdf', filename: 'foo.pdf', id: document:1b22a97499fb98242e5bf5f4e0315508 }, summary: "..." }, { best_chunk_score: 0.25366124467119633f, chunks: [ { chunk_index: 0, content: "...", id: chunk:03526f27c67ce54efb98d6863482310d, score: 0.24738225372049172f }, { chunk_index: 2, content: "...", id: chunk:bdfd9463293b47a9f235c540e75c8ffc, score: 0.25366124467119633f } ], doc: { content_type: 'application/pdf', filename: 'bar.pdf', id: document:f3f15298ffda45019418a865dfb8f7e9 }, summary: "..." } ]
SELECT *, (1 - vector::distance::knn()) AS score OMIT embedding FROM chunk WHERE embedding <|5,40|> $embedding ORDER BY index ASC
This searches directly against chunk embeddings - the most straightforward RAG approach. The ORDER BY
index ASC ensures chunks maintain their original document order.
GROUP BY doc
All matching chunks are grouped by their parent document, with arrays collecting the content, scores, IDs, and indexes.
Step 3: array transformation
array::transpose([ contents, scores, chunks, chunk_indexes ]).map(|$arr| { content: $arr[0], score: $arr[1], id: $arr[2], chunk_index: $arr[3] }) AS chunks
This is a clever data reshaping trick. The array::transpose
function converts four parallel arrays into an array of tuples, which the map then transforms into structured chunk objects. The result is a clean, nested structure perfect for AI agent consumption. Here’s a nice example of how array::transpose works.
Direct chunk retrieval works best when:
Both queries showcase SurrealDB’s unique strengths for RAG systems:
WHERE embedding <|5,40|> $embedding
), graph traversal (<-MENTIONS_CONCEPT<-
), relational grouping, and document operations all in one query. No context switching between databases.array::group
, array::flatten
, and array::transpose
let you reshape data without application-side post-processing. The map operator applies transformations inline.<-MENTIONS_CONCEPT<-chunk
reads like plain English and eliminates complex JOIN logic.Let’s see some examples of how you can customise the previous patterns.
WHERE embedding <|5,40|> $embedding
5
): How many results to return40
): Candidate pool size for approximate searchWHERE embedding <|5,40|> $embedding AND doc.created_at > time::now() - 7d AND doc.access_level = $user_permissions
Add temporal, security, or categorical filters to constrain retrieval.
Instead of math::max(score)
, try:
math::mean()
- Average relevance across chunkscount()
- Number of matching chunks (vote-based)After retrieval, add a second pass:
SELECT *, rerank_score($query, summary, chunks) AS final_score FROM previous_results ORDER BY final_score DESC LIMIT 5;
Want to implement knowledge graph RAG yourself? SurrealDB’s tooling makes it surprisingly accessible.
SurrealDB Cloud is the fully-managed DBaaS that handles all infrastructure concerns:
Create an instance in minutes, connect via CLI or SDK, and start loading your knowledge graph. The same queries you develop locally work identically in production. Visit the SurrealDB Cloud website to learn more.
Surrealist is the graphical IDE that makes working with knowledge graphs visual and intuitive:
The Personal Sandbox feature auto-generates API documentation for your schema with code snippets in multiple languages - perfect for team onboarding or building client applications.
This query shows concepts that show up in more than 2 documents. If you change the WHERE
clause to WHERE array::len(docs) <= 1;
, you’ll get concepts that only appear on one document. Queries like this give you some insights on your graph, and could help you evaluate if “concepts” are the right type of node for your use case.
SELECT docs.filename AS docs, id AS concept FROM ( SELECT id, array::distinct(<-MENTIONS_CONCEPT<-chunk.doc) AS docs FROM concept ) WHERE array::len(docs) > 1;
The beauty of SurrealDB’s unified approach is that you can start simple and evolve:
Whether you’re building a customer support bot, a research assistant, or an autonomous agent, knowledge graph RAG gives your AI the context it needs to be genuinely helpful.
Ready to build?
Want to dive deeper? Check out our AI labs or explore the full SurrealQL reference to unlock more powerful query patterns.
featured
Jan 27, 2026 8 min read
featured
Feb 5, 2026 11 min read
Explore our releases, news, events, and much more