SurrealDB | Ten more schema tips

It’s no exaggeration to say that AI has taken the world by storm.

While many other databases refer to themselves as AI databases or vector databases, we prefer to use the term “AI-native” for SurrealDB. That’s because SurrealDB is a multi-model general purpose database that includes features such as vector similarity and indexing, graph and record links, and use as an embedded data store.

In many of our recent blog posts we have showcased many of SurrealDB’s uses in AI applications, such as the official SurrealDB LangChain integration, semantic search wih SurrealDB and OpenAI (and another one using Mistral), and building a smart knowledge agent with SurrealDB and Rig.rs.

Each of the examples in those posts is a minimal runnable example that you can use to build on to develop your own AI solution using SurrealDB as the data store.

This post goes a good ways beyond a minimal example to show just how easy it is to get all this and more. Since SurrealDB is a multi-model database that just happens to be AI-native, AI solutions can be combined or used alongside all of SurrealDB’s features such as graph queries, full-text search, and embedding as an in-memory database.

To demonstrate this, we will create a small UI using Rust and the Iced crate which is one of the nicest ways to build a UI in Rust (egui is another one, which I tend to use more frequently but Iced has a bit more of a webpage feel which is nice).

This UI will have a few buttons to let you do a few things:

Insert documents into the database that are taken from Wikipedia’s summary API. You can see an example of the output here, from which we will use the title and extract fields:

{
  "title": "Calgary",
  "extract": "Calgary is a major city in the Canadian province of Alberta. As of 2021, the city proper had a population of 1,306,784 and a metropolitan population of 1,481,806 making it the third-largest city and fifth-largest metropolitan area in Canada."
}

Manually link two documents together via their titles
Look through the extract for capitalized words, check if there is an article with a matching title and link the articles together if that is the case. In the example above, it would link Calgary to Alberta and Canada if the database had all three of these documents.
Add OpenAI embeddings
Add Mistral embeddings
Perform OpenAI similarity search for a document
Perform Mistral similarity search for a document
Perform full-text search on the documents in the database
See all the links for a document by performing a recursive query to a depth of 3
See all document titles
Run raw queries, since an embedded database won’t have an endpoint for you to connect via Surrealist or the CLI.

I must admit to having had a bit too much fun putting this together, and it was difficult to tell myself to stop building the app and get around to writing about it. I opted instead to leave in the potential for some unused features that you might want to pick up yourself to further develop the app as you see fit.

The first code that is run for the UI is the initialization for the database. The first statements set up the namespace and database, along with the fields for the document table.

DEFINE NAMESPACE ns;
DEFINE DATABASE db;
USE NS ns;
USE DB db;
DEFINE FIELD extract ON document TYPE string;
DEFINE FIELD title ON document TYPE string;
DEFINE FIELD mistral_embedding ON document TYPE option<array<float>> DEFAULT [];
DEFINE FIELD openai_embedding ON document TYPE option<array<float>> DEFAULT [];

After that we have some statements to set up full-text search. The en_analyzer search analyzer will break up any input string by class (i.e. the input "Hi!123" would become ['Hi', '!', '123']), then change each token to lowercase, and then build an edgengram over everything from 3 to 10 characters in length.

DEFINE ANALYZER en_analyzer TOKENIZERS class FILTERS lowercase,edgengram(4,10);

A search analyzer is easy to test out by using the search::analyze() function. Let’s give it a try!

DEFINE ANALYZER en_analyzer TOKENIZERS class FILTERS lowercase,edgengram(3,10);
search::analyze("en_analyzer", "I run really fast");

-- Output:
[
	'run',
	'rea',
	'real',
	'reall',
	'really',
	'fas',
	'fast'
]

An edgengram works best when an app is watching user keypresses in real time. This is one of the features that was not yet added to the app, but you can continue the feature here if you like.

Once a search analyzer

DEFINE INDEX en_extract ON document FIELDS extract SEARCH ANALYZER en_analyzer BM25;
DEFINE INDEX en_title ON document FIELDS title SEARCH ANALYZER en_analyzer BM25;

DEFINE TABLE link TYPE RELATION IN document OUT document ENFORCED;

DEFINE INDEX only_one_link ON link FIELDS in,out UNIQUE;

Finally, we have two statements that relate to the linking of documents.

The first is a table definition that sets a relation (that we will just call link for lack of imagination) that must be ENFORCED. This clause will make sure that a RELATE statement fails if the two documents to be related don’t exist yet.

(By the way, being able to relate documents that don’t exist yet is a feature that is nice to have. This flexibility allows for patterns such as using graph queries as assertions, as this example shows)

DEFINE TABLE link TYPE RELATION IN document OUT document ENFORCED;

Our app will be mindlessly trying to RELATE one document to another if it finds a matching word in the summary, so we will add a UNIQUE index to ensure that no two documents can be related to one another more than once.

DEFINE INDEX only_one_link ON link FIELDS in,out UNIQUE;

Following these DEFINE statements are a number of queries used throughout the app to accomplish the behaviour returned from the various buttons on the app. Here they are in the same order that we introduced them above.

Manually linking two documents together via their titles: the logic here is nice and simple since we are going to create document records that have an ID made up of the article name. For example, document:Germany or document:Solar_System. Since record IDs must be unique, no article with the same name can be created twice.

After that, a quick RELATE statement is all that is needed. Here is the part of the SDK code that contains the statement.

let one = RecordId::from_table_key("document", one);
let two = RecordId::from_table_key("document", two);
match self
    .db
    .query("RELATE $one->link->$two;")
    .bind(("one", one))
    .bind(("two", two)) // Then handle the result...

Next is the part in which we can click a single button to look through an article’s extract for capitalized words, and see if there are any matching articles. This part of the query does nothing but grab each article.

SELECT * FROM document

Later on you might want to add another button that only looks at documents without an outbound link via the WHERE !->link->document clause.

SELECT * FROM document WHERE !->link->document

But we don’t want to use this as default behaviour because that would mean that document:Earth linked to document:Moon wouldn’t be linked to a document called document:Sun that we add later, even if the text in the Earth article included a mention of the Sun.

OpenAI and Mistral embeddings: these are done via the following query.

(SELECT 
    (extract.slice(0, 50) + '...') AS extract,
    title,
    vector::distance::knn() AS distance
        FROM document
        WHERE {embedding} <|4,COSINE|> $embeds
        ORDER BY distance).filter(|$t| $t.distance > 0.0001);

The extract.slice part is used to only show the first 50 characters of an extract. This can be changed to extract to show the entire field.
The vector::distance::knn() function returns the nearest neighbours, based on the following WHERE clause. Here, <|4,COSINE|> will use the cosine distance to return the closest four results.
The embedding part in this case is the name of the field with the embeddings, either openai_embedding or mistral_embedding.
.filter at the end is used to filter out any results that have a distance effectively equal to zero. This will filter out the article itself, because we don’t want to see the article for document:Sun as the closest neighbour to the same document:Sun.
Add OpenAI embeddings
Add Mistral embeddings
Perform OpenAI similarity search for a document
Perform Mistral similarity search for a document
Perform full-text search on the documents in the database
See all the links for a document by performing a recursive query to a depth of 3
See all document titles
Run raw queries, since an embedded database won’t have an endpoint for you to connect via Surrealist or the CLI.

Ten more schema tips

What's new in Surrealist 3.5

Building an AI-native multi-model UI with SurrealDB

Get insider access to Surreal's latest news and events