• Start

Vector search

Vector search model

Learn how vector embeddings represent semantic similarity, how SurrealDB stores and queries vectors alongside other models, and where to find guides on pipelines, similarity, indexes, hybrid search, and RAG-style retrieval.

A vector database is specialised for storing high-dimensional vectors and for efficiently performing queries on them. Rather than searching on exact values or text-based queries, vector databases let you search based on semantic similarity. For instance, in a text embedding scenario, you can find documents that are semantically similar to a given query, even if they do not share the same keywords.

This allows for usage in areas such as:

  • Recommendation systems: Suggest items (movies, products, etc.) similar to what a user has liked based on learned embeddings.

  • Image or audio search: Identify images or audio clips semantically similar to a given sample.

  • Clustering and classification: Perform unsupervised clustering of data points or quickly identify which category a vector is close to.

With SurrealDB’s query language you can define vector fields, store numeric arrays as embeddings, create indexes, and perform similarity queries. SurrealDB’s approach unifies these features to allow you to move from multiple data stores (a dedicated vector database plus separate document database and finally a graph database to stitch them together) to a single source of truth.

But how do you “think” in a vector database? Unlike relational or document models, where the focus is on well-defined schemas and relationships, vector databases revolve around embeddings: numerical representations of objects (like text, images, audio snippets, etc.) in a continuous vector space. This allows you to design data structures and queries to exploit these embeddings for similarity search or AI-driven retrieval.

Vector search is a search mechanism that goes beyond traditional keyword matching and text-based search methods to capture deeper characteristics and similarities between data.

Vector search isn't new to the world of data science. Gerard Salton, known as the Father of Information Retrieval, introduced the Vector Space Model, cosine similarity, and TF-IDF for information retrieval around 1960.

It converts data such as text, images, or sounds into numerical vectors, called vector embeddings. You can think of vector embeddings as cells. In the same way that cells form the basic structural and biological unit of all known living organisms, vector embeddings serve as the basic units of data representation in vector search.

In practice, embeddings are typically dense vectors of real numbers that capture the semantic or contextual meaning of data. For instance, in Natural Language Processing (NLP), a word or sentence can be transformed into a vector of length 128, 256, 768, or even more dimensions. The idea is that similar objects (in meaning) end up having similar vector representations, making it possible to compute how close they are in the vector space.

Embeddings themselves are not generated by the database. Instead, they depend on which model is used to generate them. Various companies such as OpenAI and Mistral have both free and paid models, while many other free models exist to generate embeddings.

Inside a database they will be stored in this sort of manner.

[
{
embedding: [
0.0007022718782536685,
0.004178352188318968,
0.009888353757560253,
// and so on for 128, 256, 768, or even more numbers
],
text: "To be, or not to be: that is the question."
},
{
embedding: [
-0.027426932007074356,
0.0008020889363251626,
-0.02949262224137783
],
text: "All the world’s a stage, and all the men and women merely players."
},
{
embedding: [
-0.05859993398189545,
-0.011999601498246193,
-0.06185592710971832
],
text: "The course of true love never did run smooth."
}
]

The guides in this section break down how to store embeddings and query them in practice:

For statement-level reference, see Vector functions and DEFINE INDEX (vector search).

Was this page helpful?