Introduction: LLMs and the Need for RAG
We are seeing Large language models(LLMs) being used in all sort of areas such as information retrieval, data conversion, code generation, and even conversational interfaces.
LLMs often struggle with accuracy and hallucinate information. Based on when and what dataset they were trained on they can also become outdated if they lack access to the most current data.
While LLMs hallucinate due to their statistical nature, limited training data, and lack of real-world grounding, adding an external source of up-to-date domain-specific information can help LLMs provide more contextually relevant and factually grounded answers.
This method of adding an external structured source of data for LLMs to reason with is called Retrieval-Augmented Generation (RAG). RAG combines the generative capabilities of language models with information retrieval from external knowledge sources and helps anchor the LLM's responses to relevant, factual information.
Implementing RAG with SurrealDB
A RAG application typically converts both the user's query and the documents in the external knowledge base into vector embeddings and stores them in a database. There are databases that only cater to vector operations, but in most cases they can be an overkill if your existing database can support vectors, its functions, and semantic search.
SurrealDB is a multi-model database. It supports relational, document, graph, and time series data models within a single system. It can also store vector data.
This means you don’t need separate systems for structured/unstructured data and vector operations - it's all in one place. You also don’t need adjacent libraries to score and rank your vector search since SurrealDB supports vector operations natively. Your application ends up with fewer moving parts which reduces the overall complexity.
As we move ahead, we will see a big chunk of information being stored as numerical embeddings. But you shouldn’t have to use different data query languages to interact with your hybrid data. In SurrealQL, you can include your vector functions and algorithms in the same query. This will give you more defined control of your data.
Let’s Build: A RAG assistant with OpenAI and SurrealDB
This tutorial is based on the talk Unlocking the Future of AI: Secure and Intelligent Retrieval with OpenAI and SurrealDB Vector Search given by Machine learning engineer Cellan Hall at SurrealDB's Futures Forum event.
We'll build an assistant that answers questions based on Wikipedia information, using the GPT-3.5 Turbo model from OpenAI. Our goal is to create an assistant that generates answers to questions it's aware of and explicitly states when it doesn't have enough information, avoiding hallucination.
We use SurrealDB to handle the retrieval process and storing the embeddings.
Prerequisites
System Requirements: MacOS Sonoma 14.4, SurrealDB, Python env > 3.11
Software Installation: SurrealDB, Rust compiler.
OpenAI API Key: Obtain from OpenAI Developer Quickstart
GitHub Repo: Clone the application and switch to the surrealdb-openai folder to follow along.
Step 1: Understanding the Application Architecture
User Query: The user sends a query to the application.
LLM API (OpenAI GPT-3.5 Turbo): The query is processed by the OpenAI API, which generates a vector embedding representing the query’s semantic meaning.
SurrealDB Vector Search: The query embedding is sent to SurrealDB, where a vector search is performed to identify the most relevant document stored in the database.
Document Retrieval: We return the most relevant document based on a similarity score containing the context necessary to enhance the LLM's response.
Prompt Augmentation: The retrieved document augments the original query and is sent back to the LLM API.
Response Generation: The LLM finally processes the augmented prompt, generating an accurate response grounded in up-to-date information based on the external dataset added.
Step 2: Running and Testing the App
Before we get into the details of this application, let's run it to see if our RAG app works as expected.
Follow the README step-by-step to run SurrealDB and test the application.
Here’s what the app will look like after successfully running it in your Python environment.
Step 3: Key Components of the Application
Now that our RAG app is running smoothly, let's take a tour through the key components of the application.
Schema Overview
We start our schema definition by defining namespaces and databases, which in SurrealDB help you scope and limit access to your data. You can find the definition in the define_ns_db.surql file.
The main tables in our SurrealDB schema are:
wiki_embedding: Stores the wikipedia article URLs, titles, content, and their vector embeddings. We also index our knowledge store using a vector index.chat: Manages chat sessions with timestamps and titles.message: Stores individual messages within chats.sent: Handles the relationship between chats and messages.
Core SurrealQL Functions
Once all tables and indexes are defined we can move towards some key functions that power our RAG application:
Generating embeddings: OpenAI offers multiple LLM models. They cover a wide range of use cases and price points. Once you have the model fixed and your input, you can pass it to the
embeddings_completesurrealql function where we use ahttp::postfunction to interact with the OpenAI API and return the embeddings.We access the api key from the
get_openai_token()function.Here,
$openai_tokenis a variable that will be populated with the value from the.envfile when the SurrealDB instance is started. Note: Remember to not push your .env file to version controlSearching for relevant documents: For every prompt we find the most relevant document using the vector index and the cosine similarity between each document's
content_vectorand the input vector.The RAG function: This function ties everything together - embedding generation, document retrieval, prompt creation, and AI response generation - making it the core of our RAG application.
Our assistant has a chat interface similar to ChatGPT. Where every chat includes its messages and we can also retrieve the conversation history and context.
Let's see how to build it.
Chat and Message Management
In a traditional relational database setup, we would link the chat and message table using foreign keys. With the growing chat history, the JOINS can get overwhelming.
You could use a separate Graph database to connect the two nodes with an edge, but why would you when you can reduce complexity by using graph relations from SurrealDB.
We can directly link chats to messages using a sent relationship, creating a structure:
And to fetch all messages in a chat you would run:
Functions fn::create_message(), fn::create_system_message(), fn::generate_chat_title(), and others help with creating and managing messages, generating AI responses, and organising chats, along with retrieving conversation history and generating chat titles.
Step 4: Hooking it up to Python
Now, let's bring in FastAPI to get our backend rolling:
We'll need a few key endpoints:
Creating a new chat:
Sending a user message:
Generating an AI response:
As mentioned before you can refer the repo for the other api endpoints.
These endpoints are our bridge between the front end and our RAG-powered back end.
Step 5: Frontend Overview
For the frontend, we're keeping it simple with a few HTMX templates:
index.html: The main chat interfacechats.html: Shows all existing chatscreate_chat.html: For starting a new chatload_chat.html: Displays messages in a chatsend_user_message.html: Renders user messagessend_system_message.html: Displays AI responses
This setup gives us a smooth, responsive interface that plays nicely with our RAG backend.
Wrapping it up
And there you have it! We've built a RAG application from the ground up with SurrealDB and OpenAI's GPT-3.5 Turbo. Here’s what Cellan had to say on why he chose SurrealDB to build his RAG application.
Using SurrealDB with OpenAI has been an exciting and rewarding experience. SurrealDB’s multi-model nature allowed me to rapidly iterate on my data schema, starting with schema-less tables and transitioning to schema-full tables as my ideas took shape. The extensive feature set of SurrealDB enabled me to write the majority of the application in SurrealQL, which meant I could avoid relying on additional services or packages for vector search and document retrieval. Of course, SurrealDB is flexible enough to integrate seamlessly with other popular LLM frameworks like LangChain, offering developers the freedom to choose how they want to build their applications. This project is just the beginning of what’s possible with SurrealDB and large language models, and I’m eager to explore further enhancements using SurrealML in the future.
The combination of SurrealDB's vector search and OpenAI's language model gives us a powerful tool for smart, context-aware information retrieval and generation. Whether you're building a Q&A system or generating personalised content, you should check out SurrealDB’s vector functions.
So go ahead, give it a spin, and see what you can create!
