

AI vector search is all the rage these days. Since SurrealDB has vector search built in, it's easy to explain the basics in a single page.
Vector search is done using embeddings, which are extremely long arrays of floats that represent the semantic (meaningful) space occupied by a string. You can get these arrays by sending in strings to services like OpenAI and Mistral. This is the main way that these companies make money, by coming up with models that represent a string's "semantic space", which people are willing to pay for.
For example, the following is the result for "A Tale of Two Cities by Charles Dickens" if you send it in to Mistral's embedded model.
The more the number of floats in the array, the more precise it is. That also means that for a short demonstration, you can usually cut down the length of the array and the precision of the float and still get a pretty good result. Let's do that with the eight books in our library. We'll also use DEFINE PARAM to put them inside a database-wide parameter.
What we want to do now is go through each book to find two books: one that is most similar to it, and one that is the least similar.
To do this, we can use a vector function. There are a lot of vector functions to choose from that give either the similarity or distance, but vector::similarity::cosine() is probably the most used. The vector reference page has a convenient diagram showing visually how each type of vector search finds the distance between two embeddings.
These functions are very straightforward, as they just take two arrays of numbers that it uses to calculate their similarity.
We can use this to calculate the most and least similar books to the books we have in the $BOOKS parameter. To do this, we can grab the $BOOKS parameter and then call the .map() function to do something with each object in the array. We want to pass on the book's title, then the two books that are closest and farthest away.
Getting the least distant match is the easier part. We only need to get the book's title (the book field) and its similarity as determined by the vector::similarity::cosine() function. This SELECT will be ordered by similarity (ascending order, which is the default ordering), after which we can use [0] to grab the first result which will be the least similar book.
The most similar book can use almost the same query, except we will turn the ordering around with DESC (descending order), and grab the book at index 1. That's because the most similar book to any book will be the same book, which will be at index 0.
We can see in the output that the search seems to be working. For example, the fantasy book Alice's Adventures in Wonderland is closest to The Little Prince, but farthest from A Tale of Two Cities. And Harry Potter is closest to The Hobbit, but farthest from Dream of the Red Chamber.
If you want to try the same query with the full 1024-length embeddings, take a look at this page which has them all. You'll see mostly the same order, especially in very close books like Harry Potter and The Hobbit. But for less obvious similarities or distances, you'll see a different book show up with the longer embeddings.
Those were just the basics, but all the rest of vector search is just a more complex way of doing the same thing: finding the similarities and distances between strings. For much more information on this, including an interesting operator that looks like this: <|2,COSINE|>, take a look at the vector search reference page.
With that, we are now out of time!
We promised a 30 minute tour, and that's probably about how long it's taken to reach this point - give or take a little depending on your learning style. Let's wrap it up!