In a recent blog post we went over ten tips that you might find helpful when building and working with your database schema.
A month later, it looks like we've come up with ten more! Let's see what else SurrealDB has to offer your schema that you might not have encountered before.
Did you know that...
11: You can use the output of expressions inside DEFINE FIELD statements?
Take the following schema that includes a customer and a purchase table, followed by a separate invoice that is then linked to the original purchase.
This works fine, but the three lines that create a purchase, then an invoice, and then link one to the other are a bit awkward.
If every purchase comes with a linked invoice, we can just define that field as the output of the CREATE statement that makes the invoice. The $parent parameter can be used here to access the fields of purchase when creating it.
With this set up, a linked invoice is guaranteed to be created with every purchase.
Output:
12: Geohashes can be performant?
This example shows a query on each record in a location table that checks to see which location records are closest to it, in this case within 600 metres. As each random location is being created between a pretty small longitude and latitude (0.1 to 0.3 and 50.0 to 50.2),
The example does work, with an output featuring records that should look something like this.
But using SELECT * FROM event for each and event to find its nearest neighbours feels like overkill. There are other ways to improve this query, but we're only using it as a setup to using geohashes so let's set it aside.
A Geohash is a value that represents a rectangular space on our planet. The more characters in the geohash, the more precise the location.
You can give it a try here. If you enter the character u then you'll see a big rectangle stretching from the Arctic down to Germany - not very precise. Add an extra digit of precision with u3 and now the hash only shows a fraction of it, from Sweden and Denmark to Poland and Germany. Add one more to get u3b and now it is limited to a small part of Denmark. A Geohash at its full length of eight characters will have a precision of 19 metres, in the case of u3btbb1b showing two houses close to a fitness studio and pizza shop.
To get a precision of 600 metres, you can use a geohash with a length of 6 characters. We'll put it into the format location:[string, ulid] to represent an event that takes place inside a certain area. Let's try creating these random event records again with this new format.
Now you can put the first part of the ID (the geohash) into a range query. Anything with the same geohash will show up as a nearby event.
13: Search analyzers can be manually used?
The main way to use a search analyzer for full-text search is to define an index that uses it so that the @@ operator (the matches operator) can be used. However, because the search::analyze() function shows you the output of any given analyzer, you can use them manually too.
Here is an example of a number of analyzers that you can mix and match depending on the type of filtering you want to do. Just don't forget to call .flatten() every time.
14: You can now define a sequence?
Defining a sequence is a recent addition to SurrealQL, available since the 3.0.0-alpha versions. Let's give it a try and see how it works.
It looks like a sequence is an incrementing number. What is the use case for a sequence then?
The first thing to know about a sequence is that it is guaranteed to be unique, even across multiple nodes. It allows lock-free reads, which means that a sequence will never be rolled back in a failed transaction.
A sequence will usually use the BATCH clause which allows a node to request a range of sequences at once, reducing network chatter.
Do you need a sequence in your schema? If the following description from the PR adding sequences matches what you are building, then the answer might be yes.
15: Text can be sorted lexically?
In the following example working with Hungarian text we have a field called unique_tokens that takes the output of a search analyzer to store the unique tokens (using casting into a <set>) from a certain sample text.
This works well, except that the final tokens 'szórakoztatására', 'váljon', 'wikipédia', 'és', 'önkéntesek' show that és and önkéntesek show up at the end instead of together with the other words that start with e and o.
To ensure that these words show up in proper lexical order, just change .sort() to .sort_lexical()!
Now a token like 'és' that begins with an accented character will show up next to the other words that start with e instead of showing up after z.
16: User-submitted HTML can be sanitized for safety?
This tip is a quick but important one: if your app takes raw HTML from users, don't forget to sanitize it to remove malicious parts such as scripting.
As the output shows, the unsafe scripting portion has been removed.
17: You can use field definitions like events?
While DEFINE EVENT is the usual way to create an event, a VALUE clause can hold an entire code block so you can use it for similar behaviour.
The following code builds on the former HTML sanitization to create an alert whenever the sanitized HTML doesn't match the original HTML that a user attempted to insert.
Output:
18: SurrealDB has a lot of constant values?
SurrealDB is fairly rare among databases in having a large selection of mathematical and time-related constant values that you can make use of. Using a constant value is good practice for both accuracy and performance.
19: The question mark operator can be used to avoid errors?
Take the following statements that create a bank and its customers, along with a field that filters out customers except those that have particularly high deposits.
Unfortunately, the CREATE bank:one statement won't work because .filter() can't be called on a value that is NONE, and this bank doesn't have any customers yet.
But no problem! Using the .? operator will allow us to skip out on the filter() call in case the value is NONE, and now the statement will no longer fail.
Once the customers are added, we will see this final output.
20: Creating a mirror table can make range queries more flexible?
SurrealDB's record ranges are extremely performant because they are effectively a query on just a small portion of a table instead of the whole table.
A query on this event table might look like this, in which case all the records with exactly user:one followed by "debug" within the latest 5 hours will show up.
But a query like this won't work as expected, because it will first return all user:ones between "debug" and "production", and only then check the next part of the ID. It's not a filter, but a whittling down of the space to be queried.
If being able to use range queries in different ways is important, you can set up a mirror table that simply stores the same data in a different order. For example, this table is identical except that the middle string is removed.
With that set up, you can now do record range queries on the second table based on just a user and the time of an event.
