Show all posts

Show all posts

January 29, 2024

Beyond SQL Joins: Exploring SurrealDB's Multi-Model Relationships

by Alexander Fridriksson

How many ways can you join your data without using traditional SQL joins? As SurrealDB is a multi-model database, it has different ways "joining" data together.

In this blog post, we'll cover the three primary ways of creating relationships inspired by document and graph data models.

We’ll start off with embedded records, which are a key component of document-style databases.

When we

SELECT * FROM person

We can see the whole person record.

[
    {
        "address": {
            "address_line_1": "767 Culkeeran",
            "address_line_2": null,
            "city": "Matlock",
            "coordinates": [
                -51.195338,
                114.885025
            ],
            "country": "England",
            "post_code": "MU1P 0XX"
        },
        "company_name": null,
        "email": "said1813@example.com",
        "first_name": "Caprice",
        "id": "person:00e1nc508h9f7v63x72o",
        "last_name": "Huber",
        "name": "Caprice Huber",
        "phone": "0115 262 2984"
    }
]

The key thing to notice here is that the address field is an embedded record.

That means that instead of the address field containing the primary key to an address table, like in relational databases, we can de-normalise the data so that we can skip making joins entirely when we want the full information for the person record.

The tradeoff we are making here is wanting faster queries by using more storage space.

Embedded records

Working with embedded records is simple. We can select just the address field to get the address record or in other words the address object.

SELECT 
  address
FROM person
[
    {
        "address": {
            "address_line_1": "767 Culkeeran",
            "address_line_2": null,
            "city": "Matlock",
            "coordinates": [
                -51.195338,
                114.885025
            ],
            "country": "England",
            "post_code": "MU1P 0XX"
        }
    }
]

We use the dot notation to select a field inside the object, for example, selecting the coordinates field.

SELECT 
  address.coordinates
FROM person
[
    {
        "address": {
            "coordinates": [
                -51.195338,
                114.885025
            ]
        }
    }
]

We use the bracket notation to select just the first item in the coordinates array.

SELECT 
  address.coordinates[0]
FROM person
[
    {
        "address": {
            "coordinates": -51.195338
        }
    }
]

There is no limit to the depth of objects and arrays you can embed inside a record, allowing you to model complex data scenarios and navigate them simply with a combination of dot and bracket notation.

Record links

This brings us to record links, which enable you to create normalised data models by linking records together.

When we

SELECT * FROM review

We can see that this table is primarily made up of record IDs pointing to other records.

[
   {
       "artist": "artist:73z6oc419v1c5v34j20x",
       "id": "review:00a0ic854u4j7z02s00v",
       "person": "person:67s8cp304i4p6p83q02d",
       "product": "product:41v2qv923h3o7s55e96l",
       "rating": 3,
       "review_text": "repeat languages.."
   }
]

In SurrealDB we don’t use the primary & foreign key pairs. Instead we use record IDs, which are like primary keys that we just directly embedded in places where a foreign key would otherwise be.

This allows us to traverse from record-to-record without needing to run a table scan query, because the query planner knows exactly what table and record the id belongs to. Therefore, it just directly fetches the record instead of searching for a needle in a haystack.

Working with record links is as simple as working with embedded objects. You simply use the dot and bracket notations as if the data was all embedded in one record.

SELECT 
  rating,
  product.name
FROM review
[
   {
       "product": {
           "name": "ago classifieds"
       },
       "rating": 3
   }
]

You can also select multiple fields and have them grouped together under the product object.

SELECT 
  rating,
  product.name,
  product.category
FROM review
[
   {
       "product": {
           "category": "digital art",
           "name": "ago classifieds"
       },
       "rating": 3
   }
]

Or you can alias them and have them show up as individual fields, there is a lot of flexibility here.

SELECT 
  rating,
  product.name as product_name,
  product.category as product_cat
FROM review
[
   {
       "product_cat": "digital art",
       "product_name": "ago classifieds",
       "rating": 3
   }
]

Notice also that this completely eliminates the typical complexity and errors people make with joins. Such as not knowing the grain of the data and whether you need a left, right, inner, outer, lateral, or cross join.

Making the wrong choice can lead to a lot of headaches and perhaps some angry database administrators.

Now you might be wondering, what is the tradeoff behind the magic of record links?

The key tradeoff is referential integrity, which, while important, is not always needed.

But when it's needed, we have a way to preserve referential integrity using graph traversal, which we will cover next.

Graph relations

SurrealDB introduces a relate statement to SQL for making graph relationships.

A short introduction to working with graph data is that you have nodes connected by edges.

Relate statement - node edge node

Here, we can see that the person is a node, the order is an edge, and the product is also a node.

Another way to think about it is in terms of nouns connected by verbs, such that it forms a sentence.

Relate statement - node edge node

In our example, person order product.

Most graph databases work with these semantic triples .

SurrealDB vs graph-only databases

However, what sets SurrealDB apart from graph-only databases is that our edges are also real tables, enabling you to store information in them and allowing for even more flexible data models.

RELATE person:00e1nc508h9f7v63x72o -> order -> product:09r8bs593p1b3d41v83d
CONTENT {
       "currency": "£",
       "discount": ->product.discount,
       "order_date": time::now(),
       "order_status": "pending",
       "payment_method": "PayPal",
       "price": ->product.price,
       "product_name": ->product.name,
       "quantity": 1,
       "shipping_address": <-person.address
   };

In this instance order is actually an order items table, which has its own information along with information joined from the person and product tables.

Because we have specified the relationship in the relate statement, it becomes easy and efficient to fetch the required fields from the joined tables.

As you can see when we start getting the fields from the product table, the arrow syntax differentiates this from the record links we saw in the earlier example.

Aside from starting with the arrow syntax, you would use the dot and bracket syntax in exactly the same way as record links or embedded objects.

Notice also that the direction of the arrow changes whether we are fetching from the product or person table.

Looking at the relate statement again, we can see that we only specified one direction, going from person to order to product.

RELATE person:00e1nc508h9f7v63x72o -> order -> product:09r8bs593p1b3d41v83d

SurrealDB however, has bi-directional querying by default that therefore enables us to specify it once but use it in both directions.

The two ways of traversing the graph

When we now

SELECT * FROM order

we can see the record we created in the order edge table.

[
  {
      "currency": "£",
      "discount": null,
      "id": "order:['England', '2023-02-21T21:16:06.236861']",
      "in": "person:00e1nc508h9f7v63x72o",
      "order_date": "2023-02-21T21:16:06.236861",
      "order_status": "processing",
      "out": "product:09r8bs593p1b3d41v83d",
      "payment_method": "credit card",
      "price": 24954.01,
      "product_name": "management console",
      "quantity": 1,
      "shipping_address": {...}
  }
]

I want to bring your attention to 3 specific fields, those are in, id and out.

SELECT
  in, 
  id,
  out
FROM order
[
  {
      "id": "order:['England', '2023-02-21T21:16:06.236861']",
      "in": "person:00e1nc508h9f7v63x72o",
      "out": "product:09r8bs593p1b3d41v83d"
  }
]

Now you might be wondering, when were these created since it didn’t seem like we specified them before.

We did specify the in and out field with the relate statement and because we didn’t specify the order ID, it was generated for us.

Relate statement - in id out

Another way of looking at the relate statement is that the first node is called in, the edge is the id, and the second node is the out.

Now I want to bring your attention to the following select statement, which shows how you can select the same fields using the arrow syntax.

SELECT
  <-person,
  id,
  ->product
FROM order

Both of these approaches are valid ways of querying the data.

Complex queries made simple

Graph relations can be both super powerful and elegant once you get more comfortable using them.

In the following example, we create a simple recommendation system in one query.

SELECT 
  ->order->product<-order<-person->order->product
FROM person:00e1nc508h9f7v63x72o
[
   {
       "->order": {
           "->product": {
               "<-order": {
                   "<-person": {
                       "->order": {
                           "->product": [
                           "product:09r8bs593p1b3d41v83d",
                            "product:53c8dz218n0u1k22a70y",
        …

What this is saying is, based on all the products this person ordered, which persons also ordered those products and what did they order?

Breaking it down:

-- Focusing on the person
person:00e1nc508h9f7v63x72o

-- Which products did they order?
->order->product

-- Which persons also ordered those products?
<-order<-person

-- And what did they order?
->order->product

If we were to express this logic using SQL joins we would need up to 6 of them, as we are joining 3 tables at different levels of granularity, and then a where clause to filter on the specific person.

When doing multiple joins like this, you really start to see the advantages of the graph model.

There are always tradeoffs to each approach, and the tradeoff we are making here is establishing the relationships at write time to make it simpler and easier for everyone at query time.

You can see more discussions around graph relations in our SurrealDB Stream #17: How To Improve Your (Database) Relationships

More to come

However, what if we want to make arbitrary relationships on the fly similar to relational style joins?

We have ways of doing that, but that is a topic for another blog post , so make sure you subscribe to get notified when that comes out.

Also you can run the above queries and more using our demo data, which you can find in our docs.

     
VART: A Persistent Data Structure For Snapshot Isolation

January 23, 2024

VART: A Persistent Data Structure For Snapshot Isolation

The blog introduces VART, an Immutable Versioned Adaptive Radix Trie, designed for snapshot isolation in databases (surrealKV), exploring isolation levels, concurrency control.

We're hiring

February 1, 2024

We're hiring

We're hiring! Join SurrealDB as we continue to shape the future of database technology! We're on the lookout for exceptional individuals - those who are passionate about their craft and equally passionate about the team they work with to develop and promote groundbreaking technology.

To stay up-to-date with new blog articles, future product releases, and documentation updates, subscribe to our email newsletter below, follow us on Twitter, or follow us on Dev.