

Time elapsed: 20y2m
The invaders had clearly prepared years in advance. They arrived in large ships and boasted firearms that could strike from a distance. They were well trained, and had not come to talk.
โ
Their invasion failed, spectacularly.
โ
Twenty years ago you would have stood no chance against them, but today is a different story. As the enemy marched past the telegraph poles, thinking them to be religious symbols, their positions were relayed in real time to your commanders. Your troops destroyed their supply lines from behind, and every town they reached was already prepared for their arrival. The troops defending your towns had weapons that fire faster, farther, and more accurately than theirs. And your new financial system made it easy to send funds whenever necessary. After a month it was over.
โ
The invaders even got sick! It turns out that the common flu for your people is a devastating disease for them, as they had never had it before.
โ
Strangely enough, at this point you all began to feel pity for them and set about giving them medical care. You confiscated most of their ships, leaving a few for them to get back home. But some of them were amazed at your society, and you let them stay. All in all, they seemed rather embarrassed by the whole thing.
โ
The incident makes you chuckle and shudder at the same time. What if, twenty years ago, you hadn't noticed that the mysterious devices in the tunnel were computers? Or if nobody had discovered the tunnel the first place?
โ
After all this time spent on financial and military matters, you feel a strong urge to get back to your true love: books. You remember reading that SurrealDB has some impressive ways to search through text, and begin learning how they work.
Full text search is a feature that allows you to customize how to search through text in an intuitive way, instead of just comparing for equality. Full text search is something we interact with every day. When you do a search for a word like "sArrealdb" in a search engine and it asks you if you meant to search for "surrealdb", that's thanks to full text search.
Enabling full text search in SurrealDB is fairly straightforward, but it requires a certain amount of setup. It all starts with a DEFINE ANALYZER statement. An example of the syntax may look intimidating at first:
But the syntax for DEFINE ANALYZER is really just a list of menu items that you choose to modify text in the way you prefer.
Everyone who uses full text search will want to use it in a different way. For example, imagine that you would like to search through the first two sentences from this book:
"You manage the town's library, one of the largest in the known world at over 200 books in size thanks to your hard work. Every month or so you manage to add a new book or two to the collection from a passing traveler or merchant."
Manually searching through the string is a possibility, but will return false if you don't have an exact match:
You could work with this a little bit using functions like string::lowercase, but for anything more complex you will want to use full text search. This allows you to search through text in a number of ways, such as:
Splitting by whitespace, so that you can search by word instead of over the whole string.
Making every letter lowercase or uppercase. Without this, the words You and you will be treated as different words.
Removing punctuation and other tokens. Without this, the last word will be seen as "merchant." and not "merchant" without a period at the end.
Turning the words into their root forms, such as town's to town, passing to pass, and known to know.
Chopping up the words in certain other ways, so that the word book will show up once a user has typed bo, or so that a search for ollectio matches collection.
This can all be done automatically by choosing tokenizers and filters.
Tokenizers come first, splitting the input into individual tokens. Here are the tokenizers that are available to choose from:
blank: Creates a new token each time it encounters a space, tab, or newline character. This tokenizer turns "The invaders have been defeated!" into ["The", "invaders", "have", "been", "defeated!"].
class: Splits based on digit, letter, punctuation, and blank. This tokenizer turns "123abc! XYZ" into ["123", "abc", "!", "XYZ"].
punct: Splits based on punctiation. This tokenizer turns "Hello, World!" into ["Hello", ",", "World", "!"].
camel: This creates a new token each time a capital letter is encountered and is mostly useful for parsing programming language syntax. This tokenizer turns "AeonsFirstRustStruct" into ["Aeons", "First", "Rust", "Struct"].
SurrealDB has a function called search::analyze() that lets you confirm this output for yourself. This function takes two strings: one string that holds the name of the analyzer, followed by a string to process. Here is the output of the first example for the blank tokenizer given above:
Here is another example, this time using three tokenizers.
At this point, the text will have been split up into tokens but has not been modified in any way. This is where filters come in, which modify the tokens. Some filters are pretty self-explanatory:
lowercase: Turns into lowercase, changing ["Hello", "World"] to "hello world".
uppercase: Turns into uppercase, changing ["Hello", "World"] to ["HELLO", "WORLD"].
ascii: Changes any non-ascii characters into ascii, such as "rรฉsumรฉ cafรฉ" to "resume cafe".
After these basic filters come the most interesting ones. They are:
ngram: This keyword is followed by two numbers inside parentheses such as (2,4). An ngram will create a token for every possible part of a word of a length between the first and last number. So if you tokenize "have been found" by blank and then follow up with ngram(2,4), the transformed output will be: have into ["ha", "av", "ve", "hav", "ave", "have"], been into ["be", "ee", "en", "bee", "een", "been"], and found into ["fo" "ou", "un", "nd", "fou", "oun", "und", "foun", "ound"].

edgengram ("edge ngram"): This keyword is also followed by two numbers inside parentheses such as (2,4). An edgengram will create a token for every possible size starting at the beginning of the word of a length between the two numbers. So if you tokenize "have been found" by blank and then follow up with edgengram(2,4), the transformed output will be: have into ["ha", "hav", "have"], been into ["be", "bee", "been"], and found into ["fo", "fou", "foun"].

snowball: This reduces each word to a root form and converts it to lowercase. If you take the sentence "You manage the town's library, one of the largest in the known world at over 200 books in size thanks to your hard work." and use the punct tokenizer to remove the punctuation, the snowballed output will be: ["you", "manag", "the", "town", "library", "one", "of", "the", "largest", "in", "the", "known", "world", "at", "over", "200", "book", "in", "size", "thank", "to", "your", "hard", "work"].
Snowball filtering is implemented differently depending on the language. For example, the root form for "manage" is not "manage" itself, but "manag", in order to match forms like "managing" that do not follow the g with an e. Snowball is currently available for Arabic, Danish, Dutch, English, French, German, Greek, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil, and Turkish.
So as you can see, the syntax for DEFINE ANALYZER simply involves picking from a menu of tokenizers and analyzers depending on how you would like to modify text in order to search through it.
Let's think about how Aeon and the team will want to define an analyzer. There is a good possibility that they have enjoyed reading the stories of Sherlock Holmes, and these books are old enough to be free of copyright in our time too, so let's go with them.
Take a look at this sample from the first book of Sherlock Holmes and think about how you might want to tokenize and filter it.
โHis faceโhis dressโdidnโt you notice them?โ Holmes broke in impatiently.
โ
โI should think I did notice them, seeing that I had to prop him upโme and Murcher between us. He was a long chap, with a red face, the lower part muffled roundโโโ
โ
โThat will do,โ cried Holmes. โWhat became of him?โ
As we just learned, a class tokenizer will split up the input into tokens separated from all of the punctuation around them, while a snowball(english) filter will turn the words into their root forms. This gives us a statement that looks like this:
This analyzer that we have chosen to call english_analyzer is now ready to be used, but SurrealDB doesn't know where we want to use it yet. In order to apply it, we will need to define an index and tell SurrealDB which fields we want to apply it to. Once this index is applied, we will be able to use the @@ operator (the "matches operator"), which is used for full text search instead of the equals sign.
Let's give it a try with just the analyzer and the matches operator so that we aren't hit with too much information at the same time. This will generate an error that informs us that we now need to put an index together.
We've used DEFINE INDEX to define an index in this book before to ensure that fields are unique, but DEFINE INDEX has many other options such as FULLTEXT ANALYZER and its name, in which case it will become a full text search index.
The parts of the DEFINE INDEX that apply to us look like this:
Note the optional clauses called BM25 and HIGHLIGHTS. The BM in BM25 stands for "best matching", and is a function that ranks results by relevance, while HIGHLIGHTS allows you to highlight words that match so that they are easy to see in the search results. Those sound convenient, so let's add them too.
Putting everything together, we have our analyzer, index, two example sentences, and our first query using the matches operator.
The query works! A search for "SURGEONS" still turned up the sentences with "surgeons" and "Surgeon." without needing to directly work with any of the text ourselves.
It makes sense that people would also want to search by the title of a book, so we should build an index for the titles of books as well. This will allow people to find what they are looking for even if they can only remember vague details like "I remember that my favourite part had the word 'night' in it", or "A book that had the word 'hound' in the title".
Because our analyzer is already defined, we can do this by just adding another DEFINE INDEX to this field as well. This time, we probably don't want to add HIGHLIGHTS because titles are always quite short.
We'll also add a few more sentences from some other books to make the search more interesting.
As requested, we get the two sentences that are from "the book with 'hound' in the title" and have the word "night" (or nights, or other forms of the word).
Now what about the highlighting and search relevance that we added with the BM25 HIGHLIGHTS keywords? We haven't seen any of that in the output yet. There is a reason for this: we don't always want to see this output in our queries. However, the index has prepared the data in these records for us to use one of the rest of SurrealDB's search functions when we need to use them. These search functions are as follows:
search::score(): Returns the relevance score. Searching for "Aeon" will return a much higher score in a sentence like "Aeon Aeon Aeon" than in a sentence that is 500 words and only has "Aeon" in it once.
search::highlight(): Highlights matching keywords.
search::offsets(): Returns the position of the matching keywords.
We'll start with search::score(), which is the most commonly used of the three. This function takes a single number. Why a number? The following query - which doesn't work - will make this clear. Imagine that we want to include the relevance score of both the text and the title, and changed our query at the end to this:
It doesn't quite work because SurrealDB doesn't know what search::score() is supposed to apply to in the WHERE clause: the text field, or the title field? To let SurrealDB know which is which, we add one number between the @@ marks of the matches operator, and then pass these numbers into the functions. It will now know that text_score should apply to the search for "night", and title_score to "hound".
Putting it all together, the definitions and query look like this.
And now it works!
With this, we can assign weights to parts that we think are most important, and order the queries based on this score. Let's decide that we want the title of the book to be most important for determining relevance, so we will double its score before adding it to the relevance of the text, and replace the AND with OR so that anything matching either of these fields will show up.
The output shows one sentence that matches on both title and text, followed by one that matches on title but not text, and finally the least relevant sentence: a match on just the text but not the title.
The search::highlight() function is pretty similar, except that you need to tell SurrealDB how to highlight the text on both the left and the right. This is because you might want the word to show up like <b>word</b>, or *word*, or any other way.
Let's give this a try with hand emojis! Don't forget that we have only enabled HIGHLIGHTS in the index for text, not title.
Now it's much clearer where Sherlock Holmes shows up in the text.
And now for a bit of fun, let's put the entire content of one of these books โ The Hound of the Baskervilles โ into our database to do some text searching! Unlike Aeon and the team, we have full access to the internet and can simply pull in the content from websites like Project Gutenberg.
SurrealDB has built-in methods to fetch the content of these books, in a module called http. The function names are not surprising:
Also unsurprisingly, this functionality is disabled by default. You certainly don't want to have your database making remote calls unless you say that it should! We will get into more detail on the subject of security in Chapter 15.
To allow these functions to work, we will have to pass in the --allow-net parameter when starting the database. This command will allow us to restart the database in a mode in which these functions can be used.
With HTTP enabled, the rest of our work is pretty simple: we'll use the string::split() function to split the content by "." as a quick way to divide the book into smaller parts, and create a sentence record if the split value is at least a certain length. The statements below will take a few seconds to process, as SurrealDB needs to fetch the content of the book and then split it up into a huge number of tokens for easy searching.
Once this is done though, the magic begins! Let's see which sentences in the book have the most relevance to the word "Watson London", and highlight them too.
And while we are at it, let's take a look at the search::offsets() function to see what the output looks like for this last of SurrealDB's three search functions.
It looks like there are three results for sentences that include both the word Watson and London. The first one is very relevant, because the sentence includes only eight words besides Watson and London. Also note how the offsets function works: it shows us the start (s) and end (e) index for each word.
The last of the results is much less relevant in terms of "Watson" and "London", being quite a long paragraph with London and Watson somewhere in the middle.
We should be pretty used to full text search by now. Let's get back to the financial service set up by Aeon and the team and think about what else they could use to make it more rigorous.
We learned in the last chapter about how to make a changefeed, either for a certain table or across the whole database.
A changefeed is good to keep a general eye on all the activities of a table or database, but a changefeed is entirely passive. Sometimes we want to do something more active, by triggering some activity when a change in the data takes place. To do this, we can use a DEFINE EVENT statement. A DEFINE event needs a name, a table to be defined on, and the word THEN.
Let's warm up by creating the simplest possible event: an event that creates a record every time some change happens to a table.
This isn't the most useful event, but it's enough to demonstrate that the event does indeed work.
As the output shows, a track_everything_event record is being created every time we do something to any person record.
Let's make this track_everything event a little more useful by adding a timestamp, as well as a parameter called $event that is always set during an event. Let's see what the $event parameter contains.
This is now starting to look a little like a custom changefeed. We can now see that the $event parameter is simply the type of operation performed, such as UPDATE or DELETE.
With this information, we can add a condition to an event such as WHEN $event = "CREATE".
An event also holds other parameters related to the operation itself, such as $before and $after. Let's experiment with that a little bit.
This next event makes it impossible for any record on the person table to be deleted by throwing an error if it sees a DELETE event. Because a DELETE statement also has a $before, we can add that to the error message to taunt the user who is trying to delete the record(s).
Now no person record deletions can ever happen on this database, as the query below shows.
This next event makes it impossible to create a person with the foul names "Dummy" or "Stupidhead", instead changing the name to "Daisy".
If we combine these two, we now have a situation where a user who tried to create a user called "Stupidhead" now has to use the name "Daisy", and cannot even delete the record.
So that was pretty fun. But how might Aeon and Landevin and the rest want to use an event? Well, this part of the world in Aeon's time has some access to top-of-the-line 21st century database technology, but still needs to communicate via telegrams. At the end of each business day, it might be nice to have a lot of automatically generated messages that can be turned into telegrams and sent to all the customers that visited the bank. We can give them the table name bank_record.
In addition to the customer service aspect, it could also be useful to track down imposters. If an imposter managed to withdraw funds from your account one day, at least you would get a telegram the same day letting you know that the funds were withdrawn!
This is a good case for a record that uses an id based on the time when the transaction took place, because then employees at the end of the day will be able to query all of the bank_record records in between midnight of that day and midnight of the nest.
We can implement this with a quick function that puts together a standard message for the customers of the bank. Note the time::format() function, which when followed by %Y-%m-%d will turn a datetime into a nice readable format like 2024-06-28. The $transaction_type passed into the function will be a literal that can be one of two strings: "withrew", or "received".
Whether to pass in "withdrew" or "received" can be determined by $before.talers to $after.talers inside the UPDATE event.
We should also check to see if $before.talers and $after.talers are equal, because if that is the case then the account balance has not changed and we don't need to tell the user. We won't check for a person's pennings either, which are small units of money that nobody cares about.
All together, the event looks like this.
And now we should be able to do the same transaction as in the last chapter. At the very end, we can then query the bank_record records for the whole day. Don't forget to change the range to one that goes from today's date to tomorrow's date.
The output should now show two records, one for each customer. Not bad at all!
The above solution would work pretty well for us in the 21st century. But what about the world in Aeon's time in which the time::now() function doesn't work? It's likely that Aeon's computer attempted to connect to a server to synchronize its time, failed, and defaulted to the date January 1, 1970. And in any case, the date in terms of the 21st century calendar would be meaningless for Aeon's world because they use their own calendar to track time.
Fortunately, precise time to the millisecond or microsecond isn't a necessity for them either. All they need is some sort of general database-wide parameter that can be referenced.
As we've seen in a few parts of the book, Aeon's part of the world uses a calendar based on the number of years since an important date: the founding of Toria. At this part of the book, the current date for Aeon is 444-07-15: the 15th day of the 7th month, 444 years after the founding of Toria.
So how can this be automated for them? One way is to use the DEFINE PARAM statement, which creates a parameter that can be referenced across the entire database. The syntax is quite short: DEFINE PARAM, the parameter name, VALUE, and finally the value of the parameter itself.
And since the people in Aeon's time don't need a precise time, they can use an incrementing number instead.
We learned back in Chapter 3 that an UPSERT statement on a record ID will create a record if it does not yet exist. This makes it a good way to create a counter. Take a look at the following query and see if you notice anything interesting about it.
As we learned back in Chapter 4, instead of writing SET value = 1 (which also would have worked), we were able to write SET value += 1. This is because SurrealDB is able to infer that value is a number, and since it was not yet set, it begins with its default value: 0.
Putting these two together, we can now start to see how a counter will work for Aeon's team and their new banking system. They can use DEFINE PARAM OVERWRITE at the start of each day to set the new day, and follow the same UPDATE logic for each transaction.
As the days go by, this will eventually lead to a nice output like the one below that shows the number of bank transactions that took place over each day. Let's give this a try by overwriting the $DATE parameter a few times and UPSERTing the counter multiple times to simulate a number of bank transactions.
After a few days, the query SELECT * FROM counter will produce an output like this which shows the number of transactions that took place per day.
With this $DATE param and counter, we can now change our sample above a bit to fit Aeon's world. We'll also add a fn::readable_date() function to turn $DATE into something a bit more readable for a customer.
Precise timestamps won't be available anymore, but at least they will be able to see the date at which a transaction was performed, and which transaction of the day it was.
In the next chapter, we are going to take a deeper look at the tool we've been using throughout this book: Surrealist.
You could create an event that simply goes straight to THEN without any checks such as WHEN $event = "CREATE" or WHEN $before.name != $after.name. Inside, you can create a record that keeps track of all these parameters, and maybe a timestamp too.
A few simple operations on the person table will then result in the input below.
The shortest possible analyzer to make this happen would be one with a tokenizer like class to split text into separate words, followed by an ngram filter. This would have to be quite long since the "natio" in "international" goes from the 6th to the 10th letter. To be sure, an ngram(3,15) should work. We'll also put a lowercase filter in to ensure that search queries don't have to worry about capitalization.
You can then use the search::analyze() function to test the output, and add "natio" IN to the front of it to make sure that "natio" is one of the tokens generated.
The output for this query will be true.
We can follow the syntax of the DEFINE INDEX statement by giving it a name and specifying the table and fields it will be applied to. Plus BM25 HIGHLIGHTS at the end which will be useful for the functions that determine relevance and highlight matching text.
Here is some sample text to get started:
A simple query with "natio" as mentioned above will return both text records that have the words "international" and "Nations".
To add relevance and highlighting, we will first change @@ to @0@ so that the functions have a reference point to work from. The search::score(0) function will get its own field name as relevance, and so will the highlighting function as search::highlight('-', '-', 0).
Assuming that the record has a structure like this and that the default values are 2024 for the year and 9 for the month:
SurrealDB has a function that tells you if a record exists or not that might help.
Since this incrementing record doesn't start at a value of 0, we can't use a simple UPSERT like in the other examples in this chapter. However, we can use the function record::exists() to manually check if this incrementing record exists, and then add some simple IF ELSE logic to increment the month by one unless the month is 12, in which case it will reset to 1.