Vector search in SurrealDB


The vector search feature of SurrealDB will help you do more and dig deeper into your data. This can be used in place of, or together with full-text search.
For example, still using the same liquids table, you can store the chemical composition of the liquid samples in a vector format.
Notice that we have added an embedding field to the table. This field will store the vector embeddings of the content field so we can perform vector searches on it.
In the example above you can see that the results are more accurate. The search pulled up only the results in which the word "lead" was used to mean the material, while the final liquidsVector record had the lowest score. This is the advantage of using vector search over full-text search.
Another use case for vector search is in the field of facial recognition. For example, if you wanted to search for an actor or actress who looked like you from an extensive dataset of movie artists, you would first use an LLM model to convert the artist's images and details into vector embeddings and then use SurrealQL to find the artist with the most resemblance to your face vector embeddings. The more characteristics you decide to include in your vector embeddings, the higher the dimensionality of your vector will be, potentially improving the accuracy of the matches but also increasing the complexity of the vector search.
Computation on vectors: "vector::" package of functions
SurrealDB provides vector functions for most of the major numerical computations done on vectors. They include functions for element-wise addition, division and even normalisation.
They also include similarity and distance functions, which help in understanding how similar or dissimilar two vectors are. Usually, the vector with the smallest distance or the largest cosine similarity value (closest to 1) is deemed the most similar to the item you are trying to search for.


The choice of distance or similarity function depends on the nature of your data and the specific requirements of your application.
In the liquids examples, we assumed that the embeddings represented the harmfulness of lead (as a substance). We used the vector::similarity::cosine function because cosine similarity is typically preferred when absolute distances are less important, but proportions and direction matter more.
Filtering through vector search
The vector::distance::knn() function from SurrealDB returns the distance computed between vectors by the KNN operator. This operator can be used to avoid recomputation of the distance in every select query.
Consider a scenario where you’re searching for actors who look like you but they should have won an Oscar. You set a flag, which is true for actors who’ve won the golden trophy.
Let’s create a dataset of actors and define an HNSW index on the embeddings field.
actor:1 and actor:4 have the closest resemblance with your query vector among those who have also won an Oscar.
For HNSW configuration and the KNN cheat sheet, see Vector indexes.