Imagine you're trying to put the perfect recipe together, but every time you search, you have to wait for a slow, external chef (API) to prepare your ingredients (embeddings). Wouldn't it be better if you had all your tools in one kitchen (database)? This is the challenge that many developers face when building Retrieval Augmented Generation (RAG) pipelines. RAG combines the power of large language models (LLMs) with the ability to supplement the LLM’s understanding with information from an external source. This allows LLMs to generate more accurate and contextually relevant responses by drawing on a wider range of data, reducing hallucinations and improving factual accuracy.
The problem with external APIs
Traditional RAG pipelines often rely on external APIs to generate vector embeddings, which are numerical representations of text that capture its semantic meaning. However, this approach introduces several challenges:
Latency: Every API call introduces network latency, slowing down the entire process.
Cost: API usage can quickly become expensive.
Complexity: Managing API keys, rate limits, and different service endpoints adds complexity to the architecture.
Data security: Sending data to external APIs raises concerns about data privacy and security, as sensitive information might be exposed.
Dependencies: Your entire dataset would need to be re-encoded when changing vendors.
Integration issues: Integrating with various APIs and ensuring compatibility can be challenging and time-consuming.
A quick overview of how RAG works
Before diving into how SurrealDB solves these problems, let's quickly review how RAG works:
User query: A user poses a question or request.
Information retrieval: The query is used to search a database for relevant information.
Contextual augmentation: The retrieved information is combined with the original query.
Response generation: The LLM uses this augmented context to generate a more informed answer.
The key to semantic understanding is information retrieval and vector search
Vector search is a crucial component of RAG because it allows for semantic searches based on the meaning of the text, not just keywords. Instead of looking for exact matches, vector search finds the most relevant pieces of information based on the similarity of the embeddings. This is important because queries might not have the same keywords as the stored information, but can still be semantically similar. For example, “pancake” is closer in meaning to “flapjack” than “chocolate cake” despite “cake” being part of “pancake” and “chocolate cake”.
What are embeddings?
Embeddings are numerical representations of text (or other data types) in a high-dimensional vector space. Words, phrases, sentences, or documents with similar meanings are located closer to each other in the vector space. When you perform a vector search, the search query is also converted into an embedding vector. The database then calculates the distances between this query vector and the embedding vectors of all stored items. The closest vectors in the embedding space correspond to the most semantically similar items. A useful analogy is that of ingredients and flavours. For every ingredient in your pantry, it has a name (the word) and a flavour profile (the embedding). When you taste a lemon you know it’s similar to a lime, vinegar, etc… and when someone asks you what other ingredients can I use instead of lemon it is easy to recall a list of similar taste profiles.
A bit about embedding flavours
All embedding models are trained on corpuses of data (i.e. the input data of the domain you seek to model). Each algorithm for training an embedding model has different tactics for conceptualising the data and for optimising the model to during the training process. Here is a high level overview of some popular modelling techniques.
Word-level embeddings: These models represent individual words as fixed-length vectors, capturing semantic relationships. Examples include GloVe, Word2Vec, and FastText.
Contextual embeddings: These models generate word representations that depend on the surrounding context, capturing nuanced meanings. Examples include ELMo, BERT, and RoBERTa.
Sentence embeddings: These models are designed to create representations for entire sentences or paragraphs, capturing their overall meaning. Examples include Sentence-BERT, Universal Sentence Encoder, and SimCSE.
Understanding the data behind embedding models
Word-based models: In word-based models, each word is associated with a vector, and these vectors capture the meaning of the word in a high dimensional space.
The word list: At the heart of the model is a list of words and vocabulary that the model has been trained to understand. This is a representation of all the words that the model can translate to a vector. Each word in this list is unique and has a unique vector representation.
The vector: Each word in the vocabulary has a corresponding vector of numbers. The values in these vectors are learned during the model training process and represent the semantic meaning of the word. Words with similar meanings will have vectors that are close together in this high dimensional space.
SurrealDB as your all-in-one kitchen for RAG
SurrealDB is a multi-model database that can handle different types of data, including graph relationships, vector embeddings, and full-text search. SurrealDB also uses SurrealML, an engine that stores and executes trained ML models, and allows for integration with external training frameworks. SurrealDB allows you to generate embeddings directly within the database using SurrealQL functions, eliminating the need for external APIs or services. This means that everything is prepared in-house. You no longer have to rely on external "chefs".
Eliminate external APIs: With SurrealDB, you no longer have to rely on external APIs or services to generate embeddings, so everything is prepared in-house.
Reduce latency, complexity, and cost: This eliminates the delays, costs, complexity, and security risks of external APIs.
Multi-model database: SurrealDB is a multi-model database, meaning it can handle different types of data including graph relationships, vector embeddings, and full-text search.
Integration of ML models: SurrealDB, using SurrealML, can store and execute trained ML models, and allows for integration with external training frameworks.
Word embeddings as a simple table structure
Word-based embedding models have a simple structure that allows for straightforward storage in a database. In SurrealDB, we can represent this as a simple table with two fields:
Word (Record ID): The first field holds the word itself, which is a simple text string and also serves as the record ID. As record IDs are direct pointers to data, using them facilitates fast and easy fetching.
Vector (Array of floats): The second field stores the vector associated with that word, represented as an array of floating-point numbers. This array captures the semantic meaning of the word in a high-dimensional space.
This table of word embeddings is like a spice rack. Each word in the vocabulary, such as basil or cayenne, is like a specific spice. Each vector associated with a word is like a jar of that spice with a unique blend of components. The vector represents the semantic "flavour" of the word. The record ID of each word is like the label on the jar, allowing for easy and quick retrieval of a particular spice (or word vector). So in our analogy of the spice rack: basil would be closer to oregano than it would be to cayenne or salt.
Sentence embeddings and word vectors
Now that we have our words and vectors, we need to extend this to sentences. To create a vector representation of a sentence, you average the vectors of all the words in the sentence. This is like taking the spices in the rack and making a new spice mix that represents the combination of fetched flavours. This new flavour can now be compared with other flavours directly in the same way that a lemonade has a similar taste profile to a key-lime pie or despite not sharing any ingredients. Extending this notion further we can also understand what other concepts are connected. For example, fish and whales swim while cats and dogs walk; you mince and dice herbs but squeeze lemons and limes.
Here's how it works:
Sentence tokenisation: The input sentence is first broken down into its constituent words. For example, 'Squeeze a lime into the bowl' can be reduced to
['Squeeze', ‘a’, 'lime', ‘into’, ‘the’, ‘bowl’].If your model only has a smaller vocabulary than all the words in the tokens we reduce that to just the words in our vocabulary:
[’squeeze’, ‘lime’]Similarly we can reduce the sentence ‘Add the juice of a lemon to the sauce’ to
[’juice’, ‘lemon’]
Word vector lookup: For each word, we fetch its corresponding vector from the simple table in the database that stores the embedding model itself. For example, here is a conceptual word model based on a very small vocabulary in a 2d space:
Averaging word vectors: We take the vectors and average them to return a single vector. For example,
[’squeeze’,’lime’]can be reduced to the average of[0.0,3.8]and[4.0,0.0]which equals:[2.0,1.9]. When looking at our two sentences we see that they are more similar in our vector space than sentences like ‘Mincing cilantro’ or ‘Chopping Parsley’ would be.

How we do this in SurrealDB
The embedding table in SurrealDB
The table definition is very simple. As mentioned earlier, we have a word as a string and an embedding as an array of floats.
This would be familiar in many flavours of SQL. The more interesting Surreal twist is how we store IDs on insert of each embedding.
In traditional SQL queries one would have to use the word field in a WHERE statement. In SurQL you can directly fetch the record with a query like:
The impact of this subtle change is that we avoid table scans entirely.
Averaging word vectors in SurrealDB as a simple function
To put this all together we can define a function to generate the vectors for an entire sentence. Here is the entire function:
Let’s break down the individual steps of the function:
Step 1: Split the sentence into tokens
The notable functions here are:
string::split which splits the sentence into an array of words.
array::filter which filters out any blank words due to extra whitespace.
Step 2: Retrieve the vectors for each token
The notable functions here are:
array::map which fetches the vectors for each word as the map command will execute arbitrary logic against each element in the array.
type::thing which will translate the word to a record ID in which we avoid the aforementioned table scans.
array::filter we use a filter again to eliminate any words that are not part of the embedding model’s vocabulary.
Step 3: Transpose and generate the average vector
The notable functions here are:
array::transpose which transforms the arrays into a format that we can execute the averaging on.
array::map we again leverage the map function. This time to perform a sum on the elements of the arrays and return a single array (i.e. vector).
vector::scale which divides each summed element in the vector by the total number of vectors, achieving the average value in each dimension.
With this function in hand we can now return embeddings for any bit of text:
Or on data ingest have an embedding automatically calculated:
Or leverage this in your semantic searches without calling an external API for your input text:
Bringing it back to RAG
With your embedding model installed in the same database as your underlying database, you now have the means to eliminate the points of friction associated with an external embedding call. To update your end-to-end RAG application, replace your embeddings API call with a single query that returns the relevant text for enhancing your prompts.
This lets you go from this:
Set up and when you add new data:
Upload your corpus to a database
For each article/document, generate your vectors and update your data base via api
User asks a question:
Calculate the vector for search from you vector via api
Query your corpus database
Add the relevant corpus data to the prompt and allow your LLM to answer

To this:
Set up and when you add new data:
Upload your corpus to a database
User asks a question:
Query your corpus database
Add the relevant corpus data to the prompt and allow your LLM to answer

Key takeaways
SurrealDB allows you to perform all of your required database operations in one single place.
You no longer need to rely on external "chefs" (APIs).
The word embeddings are stored in a simple table, similar to a spice rack.
Defining your own function like the aforementioned fn::sentence_to_vector is a convenient way to create a sentence or paragraph embedding from word vectors by averaging, at query time, directly within the database.
By utilising SurrealDB's in-database embedding capabilities, you can build faster, more secure, and cost-effective RAG pipelines. This approach eliminates the complexities associated with external APIs and allows you to focus on building innovative applications that can take advantage of modern AI.
