Futures Forum event.
We’ll build an assistant that answers questions based on Wikipedia information, using the GPT-3.5 Turbo model from OpenAI. Our goal is to create an assistant that generates answers to questions it’s aware of and explicitly states when it doesn’t have enough information, avoiding hallucination.
We use SurrealDB to handle the retrieval process and storing the embeddings.
Before we get into the details of this application, let’s run it to see if our RAG app works as expected.
Follow the README step-by-step to run SurrealDB and test the application.
Here’s what the app will look like after successfully running it in your Python environment.
Now that our RAG app is running smoothly, let’s take a tour through the key components of the application.
We start our schema definition by defining namespaces and databases, which in SurrealDB help you scope and limit access to your data. You can find the definition in the define_ns_db.surql
file.
The main tables in our SurrealDB schema are:
wiki_embedding
: Stores the wikipedia article URLs, titles, content, and their vector embeddings. We also index our knowledge store using a vector index.
DEFINE INDEX IF NOT EXISTS wiki_embedding_content_vector_index ON wiki_embedding FIELDS content_vector MTREE DIMENSION 1536 DIST COSINE;
chat
: Manages chat sessions with timestamps and titles.message
: Stores individual messages within chats.sent
: Handles the relationship between chats and messages.Once all tables and indexes are defined we can move towards some key functions that power our RAG application:
Generating embeddings: OpenAI offers multiple LLM models. They cover a wide range of use cases and price points. Once you have the model fixed and your input, you can pass it to the embeddings_complete
surrealql function where we use a http::post
function to interact with the OpenAI API and return the embeddings.
DEFINE FUNCTION IF NOT EXISTS fn::embeddings_complete($embedding_model: string, $input: string) { RETURN http::post( "https://api.openai.com/v1/embeddings", { "model": $embedding_model, "input": $input }, { "Authorization": fn::get_openai_token() } )["data"][0]["embedding"] };
We access the api key from the get_openai_token()
function.
DEFINE FUNCTION IF NOT EXISTS fn::get_openai_token() { RETURN "Bearer " + $openai_token; };
Here, $openai_token
is a variable that will be populated with the value from the .env
file when the SurrealDB instance is started.
Note: Remember to not push your .env file to version control
Searching for relevant documents: For every prompt we find the most relevant document using the vector index and the cosine similarity between each document’s content_vector
and the input vector.
DEFINE FUNCTION IF NOT EXISTS fn::search_for_documents($input_vector: array<float>, $threshold: float) { LET $results = ( SELECT url, title, text, vector::similarity::cosine(content_vector, $input_vector) AS similarity FROM wiki_embedding WHERE content_vector <|1|> $input_vector ORDER BY similarity DESC LIMIT 5 ); RETURN { results: $results, count: array::len($results), threshold: $threshold }; };
The RAG function: This function ties everything together—embedding generation, document retrieval, prompt creation, and AI response generation—making it the core of our RAG application.
DEFINE FUNCTION IF NOT EXISTS fn::surreal_rag($llm: string, $input: string, $threshold: float, $temperature: float) { LET $input_vector = fn::embeddings_complete("text-embedding-ada-002", $input); LET $search_results = fn::search_for_documents($input_vector, $threshold); LET $context = array::join($search_results.results[*].text, "\n\n"); LET $prompt = "Use the following information to answer the question. If the answer cannot be found in the given information, say 'I don't have enough information to answer that question.'\n\nInformation:\n" + $context + "\n\nQuestion: " + $input + "\n\nAnswer:"; LET $answer = fn::chat_complete($llm, $prompt, "", $temperature); RETURN { answer: $answer, search_results: $search_results, input: $input }; };
Our assistant has a chat interface similar to ChatGPT. Where every chat includes its messages and we can also retrieve the conversation history and context.
Let’s see how to build it.
In a traditional relational database setup, we would link the chat
and message
table using foreign keys. With the growing chat history, the JOINS can get overwhelming.
You could use a separate Graph database to connect the two nodes with an edge, but why would you when you can reduce complexity by using graph relations from SurrealDB.
We can directly link chats to messages using a sent
relationship, creating a structure:
chat->sent->message
And to fetch all messages in a chat you would run:
SELECT out.content, out.role FROM $chat_id->sent ORDER BY timestamp FETCH out;
Functions fn::create_message()
, fn::create_system_message()
, fn::generate_chat_title()
, and others help with creating and managing messages, generating AI responses, and organizing chats, along with retrieving conversation history and generating chat titles.
Now, let’s bring in FastAPI to get our backend rolling:
app = fastapi.FastAPI(lifespan=lifespan) app.mount("/static", staticfiles.StaticFiles(directory="static"), name="static") templates = templating.Jinja2Templates(directory="templates")
We’ll need a few key endpoints:
Creating a new chat:
app.post("/create-chat", response_class=responses.HTMLResponse) async def create_chat(request: fastapi.Request) -> responses.HTMLResponse: chat_record = await life_span["surrealdb"].query( """RETURN fn::create_chat();""" ) return templates.TemplateResponse( "create_chat.html", { "request": request, "chat_id": chat_record[0]['result']['id'], "chat_title": chat_record[0]['result']['title'], }, )
Sending a user message:
app.post("/send-user-message", response_class=responses.HTMLResponse) async def send_user_message( request: fastapi.Request, chat_id: str = fastapi.Form(...), content: str = fastapi.Form(...), ) -> responses.HTMLResponse: """Send user message.""" message = await life_span["surrealdb"].query( """RETURN fn::create_user_message($chat_id, $content);""", {"chat_id": chat_id, "content": content} ) return templates.TemplateResponse( "send_user_message.html", { "request": request, "chat_id": chat_id, "content": message[0]['result']['content'], "timestamp": message[0]['result']['timestamp'], }, )
Generating an AI response:
app.get("/send-system-message/{chat_id}", response_class=responses.HTMLResponse) async def send_system_message( request: fastapi.Request, chat_id: str ) -> responses.HTMLResponse: message = await life_span["surrealdb"].query( """RETURN fn::create_system_message($chat_id);""", {"chat_id": chat_id} ) title = await life_span["surrealdb"].query( """RETURN fn::generate_chat_title($chat_id);""", {"chat_id": chat_id} ) return templates.TemplateResponse( "send_system_message.html", { "request": request, "content": message[0]['result']['content'], "timestamp": message[0]['result']['timestamp'], "create_title": title == "Untitled chat", "chat_id": chat_id, }, )
As mentioned before you can refer the repo for the other api endpoints.
These endpoints are our bridge between the front end and our RAG-powered back end.
For the frontend, we’re keeping it simple with a few HTMX templates:
index.html
: The main chat interfacechats.html
: Shows all existing chatscreate_chat.html
: For starting a new chatload_chat.html
: Displays messages in a chatsend_user_message.html
: Renders user messagessend_system_message.html
: Displays AI responsesThis setup gives us a smooth, responsive interface that plays nicely with our RAG backend.
And there you have it! We’ve built a RAG application from the ground up with SurrealDB and OpenAI’s GPT-3.5 Turbo. Here’s what Cellan had to say on why he chose SurrealDB to build his RAG application.
Using SurrealDB with OpenAI has been an exciting and rewarding experience. SurrealDB’s multi-model nature allowed me to rapidly iterate on my data schema, starting with schema-less tables and transitioning to schema-full tables as my ideas took shape. The extensive feature set of SurrealDB enabled me to write the majority of the application in SurrealQL, which meant I could avoid relying on additional services or packages for vector search and document retrieval. Of course, SurrealDB is flexible enough to integrate seamlessly with other popular LLM frameworks like LangChain, offering developers the freedom to choose how they want to build their applications. This project is just the beginning of what’s possible with SurrealDB and large language models, and I’m eager to explore further enhancements using SurrealML in the future.
The combination of SurrealDB’s vector search and OpenAI’s language model gives us a powerful tool for smart, context-aware information retrieval and generation. Whether you’re building a Q&A system or generating personalized content, you should check out SurrealDB’s vector functions.
So go ahead, give it a spin, and see what you can create!
To stay up-to-date with new blog articles, future product releases, and documentation updates, subscribe to our email newsletter below, follow us on Twitter, or follow us on Dev.