Jul 30, 2025
While this implementation uses Python, the architectural patterns and SurrealDB integration strategies apply to any language with SurrealDB SDK support.
Standard Retrieval-Augmented Generation (RAG) models are powerful but operate on single shot principle: Retrieve then Generate. This approach often fails when faced with complex multi-faceted queries, providing incomplete answers that lack depth. The core limitation is the absence of a feedback loop.
This article introduces the Reflexion
RAG Engine, a production-ready system that overcomes these limitations through a multi-cycle, self-correcting architecture powered by SurrealDB. It employs a Multi-LLM strategy for Generation, Evaluation, and Synthesis to iteratively refine answers, ensuring higher accuracy and comprehensiveness. SurrealDB serves as a unified data store that houses both vector embeddings and document metadata.
Traditional RAG architectures are often fragile combinations of multiple technologies: A vector database (e.g. Pinecone, ChromaDB), a separate document store for metadata (e.g. MongoDB, Postgres), and a caching layer (e.g. Redis). This fragmentation introduces complexity in data synchronisation and security.
Our engine leverages SurrealDB to consolidate these functions into a single, ACID-compliant database. This unified approach provides specific advantages :
The 3072-dimensional vectors from OpenAI text-embedding-3-large
model from Azure provide superior semantic understanding compared to standard 1536-dimensional embeddings. SurrealDB’s HNSW (Hierarchical Navigable Small World) indexing enables millisecond-level similarity search across millions of vectors.
We define a single index hnsw_embedding
with a dimension of 3072 to match OpenAI’s text-embedding-3-large model. The use of Cosine
distance is standard for normalised embeddings. The EFC Entry point Candidate
and M Max connections
parameters are tuned to provide a high recall and query speed, making it suitable for production environments.
-- Define a schema-full table named 'documents' DEFINE TABLE documents SCHEMAFULL; -- Define a 'content' field of type string DEFINE FIELD content ON documents TYPE string; -- Define a 'metadata' field of flexible type object -- This allows for storing schemaless data DEFINE FIELD metadata ON documents FLEXIBLE TYPE object; -- Define an 'embedding' field of type array of floats -- This is typically used for storing vector embeddings DEFINE FIELD embedding ON documents TYPE array<float>; -- Create an HNSW index on the 'embedding' field -- This index is used for efficient vector similarity searches DEFINE INDEX hnsw_embedding ON documents FIELDS embedding HNSW DIMENSION 3072 -- The number of dimensions in the embedding vectors DIST COSINE -- Use cosine distance for measuring similarity TYPE F32 -- Use 32-bit floating point numbers EFC 500 -- Entry point candidate size for search performance M 16; -- Maximum number of connections per node in the graph
The core architecture is based on a loop that mimics critical thinking. For each query a series of steps occur, which are guided by specialised LLMs to ensure a detailed and accurate response.
ReflexionDecision
enum:COMPLETE
: If confidence is above the threshold (e.g., 0.90), the loop terminates.REFINE_QUERY
: If specific gaps are found, a new, more targeted query is generated to fill them.CONTINUE
: If the answer is incomplete but on the right track, the loop proceeds.INSUFFICIENT_DATA
: If the knowledge base cannot answer, the loop terminates gracefully.The system strategically assigns different models to specialised tasks:
# From src/rag/reflexion_engine.py class ReflexionRAGEngine: """Advanced RAG engine with dynamic reflexion loop architecture and web search integration""" def __init__( .... ): # Initialise different LLMs for different purposes self.generation_llm = generation_llm or GitHubLLM( model_override=settings.llm_model, temperature_override=settings.llm_temperature, max_tokens_override=settings.llm_max_tokens, ) self.evaluation_llm = evaluation_llm or GitHubLLM( model_override=settings.evaluation_model, ... ) self.summary_llm = summary_llm or GitHubLLM( model_override=settings.summary_model, ... )
The reflexion cycle can be seen in the following diagram.
It works as follows:
Cache check: Looks for a cached answer to the question and streams it if found.
Reflexion loop: For a set number of cycles, it:
Finalisation: If no confident answer is found after all cycles, synthesises a final answer from all cycles.
Streaming: Streams the final comprehensive answer with metadata.
Error handling: If any step fails, falls back to a simpler RAG approach and streams that answer.
Features:
Maintaining, updating and tuning prompts as per requirements is a must. To maintain flexibility, and enable rapid iteration, all prompts are managed in versioned YAML files located in ./prompts
directory. This separates prompt engineering from the application code allowing easier updates.
--- metadata: name: "initial_generation" version: "1.0.0" ... config: ... variables: - name: "query" type: "string" required: true description: "User's question" - name: "context" ... - name: "cycle_number" ... prompt_template: | You are an expert AI assistant providing detailed, accurate answers with proper source citations. ... Question: {{query}} Available Documents: {{context}} ... RESPONSE STRUCTURE: ... IMPORTANT GUIDELINES: ... Answer:
# A snippet from prompts/evaluation/response_evaluation.yaml RESPONSE FORMAT (JSON): { "confidence_score": 0.35, "decision": "continue|refine_query|complete|insufficient_data", "reasoning": "Detailed explanation of the assessment", ... }
The Prompt Manager is responsible for extracting prompts from the YAML prompt files and add the variables like user query, queries, previous answers and confidence scores at their dedicated places.
from dataclasses import dataclass from pathlib import Path from typing import Any, Dict, List, Optional # --- Data Structures --- @dataclass class PromptMetadata: name: str version: str # ... other metadata fields @dataclass class PromptConfig: temperature: Optional[float] = None # ... other config fields @dataclass class PromptVariable: name: str type: str # ... other variable fields @dataclass class PromptTemplate: """A complete, renderable prompt object.""" metadata: PromptMetadata config: PromptConfig variables: List[PromptVariable] prompt_template: str def render(self, **kwargs) -> str: """Renders the prompt template with the given variables.""" ... # --- Core Manager --- class PromptManager: """Manages loading, caching, and rendering of YAML-based prompt templates.""" def __init__(self, prompts_dir: Optional[str] = "prompts"): """Initialises the manager and loads all prompts from disk.""" self._prompt_cache: Dict[str, PromptTemplate] = {} # ... implementation for loading all prompts on startup ... def get_prompt(self, name: str) -> Optional[PromptTemplate]: """Retrieves a parsed prompt template from the cache by its name.""" ... def render_prompt(self, name: str, **kwargs) -> str: """Finds a prompt by name and renders it with provided variables.""" ... def list_prompts(self) -> List[str]: """Returns a list of all loaded prompt names.""" ... def reload_prompts(self): """Clears the cache and reloads all prompts from the directory.""" ... # --- Global Singleton Instance --- # Provides a single, shared instance of the prompt manager across the application. prompt_manager = PromptManager()
A simple RAG system is limited by the freshness of its data. The Reflexion Engine implements a hybrid retrieval strategy to enhance its knowledge base with real-time information from the web.
The SurrealDBVectorStore implements a similarity_search_combined
method. Instead of a single vector search, it executes two parallel queries against the documents and web_search tables, each with a separate LIMIT
(k_docs
& k_web
). The results are merged post vector search and re-ranked by their similarity score. This ensures that most relevant information, whether from a local PDF or a recent news article, is available to generation model.
Web Search is configurable via the WEB_SEARCH_MODE
setting, which can be off, initial only, or every_cycle. For complex queries, running web search in every reflexion cycle allows the engine to dynamically seek external information, filling current knowledge gaps identified during the evaluation phase.
The engine’s hybrid search combines local documents with real-time web results:
async def similarity_search_combined(self, query: str, k_docs: int = 3, k_web: int = 2) -> List[Document]: """ Perform a combined similarity search over local documents and web search results. - Embeds the query. - Retrieves top-k similar documents and web results (with limits). - Combines results, enforcing a max token budget. - Sorts all results by similarity score (descending). - Returns a list of Document objects. """ .... try: query_embedding = await self.embedding_function.embed_text(query) # Query local documents and web search tables docs_query = f""" SELECT id, content, metadata, vector::similarity::cosine(embedding, {query_embedding}) AS score FROM documents WHERE embedding <|300,COSINE|> {query_embedding} ORDER BY score DESC LIMIT {k_docs}; """ # Search web results with limit web_query = f""" SELECT id, content, metadata, vector::similarity::cosine(embedding, {query_embedding}) AS score FROM web_search WHERE embedding <|300,COSINE|> {query_embedding} ORDER BY score DESC LIMIT {k_web}; """ ... # Add document results first for result in docs_results or []: ... # Add web results if token budget allows for result in web_results or []: ... # Sort by similarity score all_documents.sort(key=lambda x: x.metadata.get("similarity_score", 0), reverse=True) ... except Exception as e: logger.error(f"Combined similarity search error: {e}") return []
This demonstrates SurrealDB’s ability to handle complex, multi-table vector operations efficiently.
This section provides a step by step guide to get the Reflexion RAG Engine running.
uv
package manager (recommended). If installed, skip the Install UV Package Manager
step.repo
and read:org
scopes.UV is a lightning-fast Python package manager written in Rust that significantly outperforms traditional pip:
# Linux/macOS curl -LsSf https://astral.sh/uv/install.sh | sh # Windows (PowerShell as Administrator) powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # Alternative: via Homebrew brew install uv # Verify installation uv --version
*# 1. Clone the repository* git clone https://github.com/cloaky233/multi-cycle-rag.git cd multi-cycle-rag *# 2. Create virtual environment and install dependencies* uv venv && source .venv/bin/activate *# macOS/Linux# .venv\Scripts\activate # Windows* uv sync *# 3. Configure your environment* cp .env.example .env *# Edit the .env file with your credentials for SurrealDB, GitHub, etc.*
uv sync
: This single command installs all production dependencies including SurrealDB Python SDK, Azure AI Inference, Crawl4AI for web scraping, and all LLM related libraries.
# Create environment file # .env.example is included in the repository! cp .env.example .env
repo
(Full control of private repositories)read:org
(Read org and team membership).env
fileThe engine uses two specialised tables, each optimised for different data types while sharing the same vector search capabilities:
-- Define a schema-full table named 'documents' DEFINE TABLE documents SCHEMAFULL; -- Define a 'content' field of type string DEFINE FIELD content ON documents TYPE string; -- Define a 'metadata' field of flexible type object -- This allows for storing schemaless data DEFINE FIELD metadata ON documents FLEXIBLE TYPE object; -- Define an 'embedding' field of type array of floats -- This is typically used for storing vector embeddings DEFINE FIELD embedding ON documents TYPE array<float>; -- Create an HNSW index on the 'embedding' field -- This index is used for efficient vector similarity searches DEFINE INDEX hnsw_embedding ON documents FIELDS embedding HNSW DIMENSION 3072 -- The number of dimensions in the embedding vectors DIST COSINE -- Use cosine distance for measuring similarity TYPE F32 -- Use 32-bit floating point numbers EFC 500 -- Entry point candidate size for search performance M 16; -- Maximum number of connections per node in the graph
-- Define a schema-full table named 'web_search' DEFINE TABLE web_search SCHEMAFULL; -- Define a 'content' field of type string -- This field will store the textual content of the web search results DEFINE FIELD content ON web_search TYPE string; -- Define a 'metadata' field of flexible type object -- This allows for storing additional information in a schemaless manner DEFINE FIELD metadata ON web_search FLEXIBLE TYPE object; -- Define an 'embedding' field of type array of floats -- This field is used for storing vector embeddings of the content DEFINE FIELD embedding ON web_search TYPE array<float>; -- Create an HNSW index on the 'embedding' field -- This index is used for efficient vector similarity searches DEFINE INDEX hnsw_embedding ON web_search FIELDS embedding HNSW DIMENSION 3072 -- The number of dimensions in the embedding vectors DIST COSINE -- Use cosine distance for measuring similarity TYPE F32 -- Use 32-bit floating point numbers EFC 500 -- Entry point candidate size for search performance M 16; -- Maximum number of connections per node in the graph
The system includes specialised SurrealQL functions that demonstrate SurrealDB’s programmability:
-- A function to count number of user documents stored DEFINE FUNCTION OVERWRITE fn::countdocs() -> int { count(SELECT * FROM documents); } PERMISSIONS FULL;
-- A function to count number of web documents stored DEFINE FUNCTION OVERWRITE fn::count_web() -> int { count(SELECT * FROM web_search); } PERMISSIONS FULL;
-- A function to delete user Documents with confirmation DEFINE FUNCTION OVERWRITE fn::deldocs($confirm: string) -> string { LET $word = 'CONFIRM'; RETURN IF $confirm == $word { DELETE documents; 'DELETED' } ELSE { 'NOT DELETED' } } PERMISSIONS FULL;
-- A function to delete web search Documents with confirmation DEFINE FUNCTION OVERWRITE fn::delweb($confirm: string) -> string { LET $word = 'CONFIRM'; RETURN IF $confirm == $word { DELETE web_search; 'DELETED' } ELSE { 'NOT DELETED' } } PERMISSIONS FULL;
-- Define a function to perform similarity search on user documents DEFINE FUNCTION OVERWRITE fn::similarity_search($query_embedding: array<float>, $k: int) -> any { -- Set a default limit for the number of results if $k is not provided LET $limit = $k ?? 5; -- Perform a similarity search using cosine similarity RETURN ( SELECT id, content, metadata, vector::similarity::cosine(embedding, $query_embedding) AS score FROM documents WHERE embedding <|300, COSINE|> $query_embedding -- Use KNN operator with cosine distance ORDER BY score DESC -- Order results by similarity score in descending order LIMIT $limit -- Limit the number of results to the specified limit ); } PERMISSIONS FULL; -- Grant full permissions for this function
-- Define a function to perform similarity search on web search results DEFINE FUNCTION OVERWRITE fn::similarity_search_web($query_embedding: array<float>, $k: int) -> any { -- Set a default limit for the number of results if $k is not provided LET $limit = $k ?? 5; -- Perform a similarity search using cosine similarity RETURN ( SELECT id, content, metadata, vector::similarity::cosine(embedding, $query_embedding) AS score FROM web_search WHERE embedding <|300, COSINE|> $query_embedding -- Use KNN operator with cosine distance ORDER BY score DESC -- Order results by similarity score in descending order LIMIT $limit -- Limit the number of results to the specified limit ); } PERMISSIONS FULL; -- Grant full permissions for this function
These functions showcase SurrealDB’s ability to handle complex vector operations with custom logic directly in the database layer, reducing network overhead and improving performance and security.
1. Obtain a Google custom search API key
The API key authenticates your project’s requests to Google’s services.
2. Create a programmable search engine and get the CSE ID
The CSE ID (also called the Search Engine ID or cx
) tells Google what to search (e.g., the entire web or specific sites you define).
3. Update your project configuration
Finally, take the two values you have obtained and place them in your project’s .env
file:
# .env file ... GOOGLE_API_KEY=your_google_api_key_here GOOGLE_CSE_ID=your_google_cse_id_here ...
For web search, you must have the google api key and cse id, for
# Install Crawl4AI with browser dependencies uv run crawl4ai-setup # Verify installation uv run crawl4ai-doctor # Manual browser setup if needed python -m playwright install chromium
*# 1. Place your documents (.pdf, .md, .docx, .txt) into the /docs directory. # 2. Ingest the documents into SurrealDB.* uv run rag.py ingest *# 3. Start the interactive chat interface.* uv run rag.py chat
To prepare the CLI to accept a complex query, first we’ll need to give it some documents to ingest.
[ The CLI provides streamlined document ingestion with progress tracking and comprehensive file format support ]
# Ingest all supported documents from a directory uv run rag.py ingest --docs_path=./docs # Expected output:# 📚 Loading documents from ./docs # 🔄 Processing 15 documents... # ⚡ Generating embeddings... # 💾 Storing in SurrealDB... # ✅ Successfully ingested 15 documents (127 chunks)
The ingestion pipeline handles multiple formats: PDF .pdf
, Text .txt
, Markdown .md
, and Word .docx
These formats are handled by the system’s DocumentLoader
, which is configured to recognise and parse these specific file extensions when you ingest documents from a directory.
The system provides a rich command-line interface for all operations:
# Launch interactive chat with reflexion engine uv run rag.py chat
# View current configuration and status uv run rag.py config # Sample output: RAG Engine Configuration ======================== LLM Model: meta/Meta-Llama-3.1-405B-Instruct Evaluation Model: cohere/Cohere-command-r SurrealDB URL: wss://your-instance.surreal.cloud Documents in store: 127 Web results in store: 45 Memory cache: Enabled (hit rate: 73%) # Delete all documents (requires confirmation) uv run rag.py delete # Ingest new documents with progress tracking4 uv run rag.py ingest --docs_path=./new_docs
Here we see some real output from the reflexion engine processing a complex query about SurrealDB’s vector capabilities.
Query: compare SurrealDB's vector search with traditional vector databases 🔄 Cycle 1 - initial generation 📚 Retrieved: 3 documents (similarity: 0.85-0.91) 💭 SurrealDB offers several advantages over traditional vector databases: 1. **Unified data model**: unlike Pinecone or Weaviate, SurrealDB combines document storage, graph relationships, and vector search in one system [Source: architecture.md] 2. **Native SQL integration**: SurrealQL enables complex queries that join vector similarity with traditional filters [Source: surrealdb_guide.md]... 🔍 Self-evaluation - confidence: 0.78 (Below threshold: 0.85) ❓ Gap identified: missing performance benchmarks and indexing details 🔄 Cycle 2 - reflexion query: "SurrealDB HNSW indexing performance benchmarks" 📚 Retrieved: 5 documents (including web search) 💭 Enhanced response with performance data: **Performance characteristics**: - HNSW indexing provides ~10x faster queries than brute force [Source: benchmarks.md] - Memory usage: ~4 bytes per dimension per vector for F32 type - EFC parameter (500) balances accuracy vs speed... 🔍 Self-evaluation - confidence: 0.89 ✅ Confidence threshold exceeded! 📊 Final metrics: - Total cycles: 2 - Processing time: 6.8s - Documents used: 8 - Final confidence: 0.89 - Cache hits: 3/8
The Reflexion RAG Engine demonstrates SurrealDB’s power as a unified platform for modern AI applications. By combining vector search, document storage, and real-time web integration in a single database, it eliminates architectural complexity while delivering superior performance.
Ready to build your own self-correcting AI system? Clone the repository, follow this setup guide, and start experimenting with SurrealDB’s vector capabilities. The future of AI is reflexive, multimodal, and built on unified data platforms.
Any questions about implementing reflexion loops with SurrealDB? Join the SurrealDB Discord, star the project repository, or share your own SurrealDB vector search experiments with the community!
featured
Jul 29, 2025
featured
Jul 31, 2025