SurrealDB University is LIVE! Master the future of data

Gradient
SurrealDB University

Authentication Required

This course requires authentication.
Please sign in to continue

Background Gradient
Next
Back to Courses

Inserting data

While exploring the magic of record IDs, we used the CREATE statement to create tables and insert the various IDs.

However, we didn’t go into what the CREATE statement really is or how to use it properly, so I’m sure you have questions about that.

You’ll get the answers to all your questions in this lesson, but if not, let me know! We’ll also cover the INSERT and UPSERT statements. We’ll explore how they’re different and when it makes sense to use them. That should cover everything you need to know about inserting data in a schemaless way, with the exception of relationships, which we’ll cover in part 2.

Since we’re covering everything you need to know, it’s going to be a lot. If you want the short version (TLDR), feel free to skip the summary section.

CREATE

As we saw in the previous lesson, CREATE table does indeed create a table. However, I wanted to start out with the surprise that our CREATE statement is for inserting data and not defining a schema, like in most SQL dialects. That is what DEFINE TABLE is for, as we’ll see in part 3.

Now that we’ve cleared that up let’s start by creating the person and product tables.

Person and Product
CREATE person; CREATE product; CREATE person, product PARALLEL;

You can either create them individually, or you can create them in parallel. The choice is yours depending on what you need.

The important thing to know, to help you decide between these, is that each statement is executed in its own transaction.

Therefore CREATE person; then CREATE product;, would be 2 transactions, whereas CREATE person, product PARALLEL; would be one transaction.

What is a TransactionA transaction in a database is a sequence of operations performed as a single unit of work to ensure data integrity and consistency.

It adheres to the ACID properties:

  • Atomicity (all operations are completed or none are)
  • Consistency (transforms the database from one valid state to another)
  • Isolation (concurrent transactions do not interfere with each other)
  • Durability (committed changes are permanent even after a system failure)

Transactions ensure reliable and consistent data management in multi-user and distributed environments.


This is meant to give you more direct control over how your queries should behave and reduce the need for you do this, somewhat complicated logic, in your application code.

CREATE person TIMEOUT 2ms; CREATE product TIMEOUT 2ms; CREATE person, product TIMEOUT 3ms PARALLEL;

The same goes for the TIMEOUT clause, which can be used to specify the maximum time the statement should take to execute. Such as when you want to tightly control compute costs or make sure queries succeed or fail within tight latency boundaries. With this we avoid a big query queue forming.

Most of the time you probably don’t need this though. However, it’s important to know you can do this when you need to.

Now, let’s finally insert more than Record IDs! The CREATE statement has two ways of doing this.

You can use the SET clause like we saw earlier, which is more SQL-like. But you can also use the CONTENT clause which is more JSON-like.

Let’s do both so we can compare like for like, starting with the CONTENT clause.

CREATE product CONTENT { name: 'Cruiser Hoodie', details: [ "Medium fit", "Set-in sleeve"], colours: ['Dark Heather Grey','Bubble Gum Pink'], sizes: [ "xs", "s", "m", "l", "xl"], price: 59, images: [ { url: "https://surrealdb.store/cdn/shop/files/cruiser-hoodie-bubble-gum-pink-white-m-1_65f13f38-058c-4e29-b2fa-aad068da75e0.jpg", position: 1 }, { url: "https://surrealdb.store/cdn/shop/files/cruiser-hoodie-dark-heather-grey-pink-icon-m-1.jpg?v=1690812164", position: 2 } ], time: { created_at: time::now(), updated_at: time::now() } };

In case you’re curious, the product we are creating here is the actual hoodie I’m wearing now, which you can order at our real store: SurrealDB.store.

But back to the CONTENT, you’ll notice that this is a denormalised schema we are creating here, which means we are including all the relevant information about the product in one table.

We could normalise it by putting details , colours , sizes and images in their own tables. That way, we could have less duplication of data as many products share the same colours and sizes for example. We would however then need to join together 5 tables if we’d want the entire product information, which is massively more compute-intensive than just reading the data from one table.

Now if you’re thinking, when would I ever do that? You might be surprised to hear that it’s more often than you think, as you’d need all that data for every product order page.

We have however not completely denormalised it as you’ll see in part 2 when we bring in the product_sku table, which shows how many items you have left in stock for each product sku.

A product sku being the combination of the specific colour and size.

More on that later though, let’s now show how this CONTENT example would look using the SET clause.

CREATE product SET name = 'Cruiser Hoodie', colours = ['Dark Heather Grey','Bubble Gum Pink'], price = 59, time = { created_at: time::now(), updated_at: time::now() };

You may have noticed it’s not that different, mostly just replacing the semi-colon : with the equals sign =

You can however break it down further by specifying the object contents separately with the dot notation.

CREATE product SET name = 'Cruiser Hoodie', time.created_at = time::now(), time.updated_at = time::now();

If you’re wondering which method is better, CONTENT or SET , I would say that neither is better, it’s more a matter of preference.

My general preference would be using SET if I’m just inserting one or two fields, but using CONTENT if I’m inserting a lot of fields, or inserting JSON data straight from the web, or some other system where I receive the data in a JSON format.

The most surreal thing about the CREATE statement however is that it’s great for data generation using the pipe syntax. Where we put the pipe character on either side of the table name and specify the number of records we want to generate, as if we were creating a record ID with that number. Since we specified 100 here, we are therefore creating a 100 product records.

CREATE |product:100| CONTENT { name: rand::enum('Cruiser Hoodie', 'Surreal T-Shirt'), colours: [rand::string(10), rand::string(10),], price: rand::int(20, 60), time: { created_at: rand::time(1611847404, 1706455404), updated_at: rand::time(1651155804, 1716906204) } };

Then instead of real data, we can use the random functions to just generate some dummy data for us. We can use rand::string for strings, rand::int for our price, rand::time for time and rand::enum for randomly selecting between the options we give it. You can find more random functions in the documentation.

This is super useful for rapid prototyping of different table structures, to see what works best.

That’s all we need to know for now about the CREATE statement, let’s therefore move over to the INSERT statement.

INSERT

The INSERT statement is more similar to what you’d expect from most SQL dialects.

You can insert a single record by using the typical tuple syntax, but with native support for arrays and objects, which I really like.

INSERT INTO product (name, colours, price, time) VALUES ( 'Cruiser Hoodie', ['Dark Heather Grey','Bubble Gum Pink'], 59, {created_at: time::now(), updated_at: time::now()} );

The same goes for inserting multiple records.

INSERT INTO product (name, colours, price, time) VALUES ('Cruiser Hoodie', ['Dark Heather Grey','Bubble Gum Pink'], 59, {created_at: time::now(), updated_at: time::now()}), ('Surreal T-Shirt', ['White Pink','Purple White'], 25, {created_at: time::now(), updated_at: time::now()});

Now when it comes to copying data from one table to the another table, it looks the same as in most SQL dialects.

INSERT INTO product_copy (SELECT * FROM product);

We do this by INSERT INTO the product_copy table by selecting everything from the product table.

There are two important things to point out here though

  • First, is that our INSERT statement doesn’t just INSERT data into tables you already created. It also creates a table if it doesn’t already exist, exactly like we just did with the product_copy table (but not when running in strict mode).
  • Second, this is currently the only way to change just the table part of your Record IDs. For example, if you accidentally used an emoji as your table name and then realised that Record IDs are immutable.

Depending on the size of your table, this could be prohibitively compute intensive, therefore it’s very important to think carefully about your Record IDs.

Now we get to the part where the INSERT statement becomes more Surreal, as it also accepts JSON-like objects.

INSERT INTO product { name: 'Cruiser Hoodie', colours: ['Dark Heather Grey','Bubble Gum Pink'], price: 59, time: { created_at: time::now(), updated_at: time::now() } };

Let’s insert a single record into our table.

Unlike the CREATE statement, INSERT does not use the CONTENT keyword after the table name. Adding it here would cause an error, just like forgetting CONTENT in CREATE would.

The reason for this is because INSERT doesn’t have both SET and CONTENT , It doesn’t need to make this distinction. You can change this query and try it out if you want to see the error.

For inserting multiple records, the only difference is that instead of inserting an object, we insert a list of objects.

INSERT INTO product [ { name: 'Cruiser Hoodie', colours: ['Dark Heather Grey','Bubble Gum Pink'] }, { name: 'Surreal T-Shirt', colours: ['White Pink','Purple White'] } ];

This is a very convenient way to bulk insert data, especially if you’re fetching the data from an API.

INSERT INTO pokemon http::get('https://pokeapi.co/api/v2/pokemon/pikachu/');

I personally use a language like Python to fetch data from APIs and insert it into SurrealDB. However, as we can see from this Pokemon API example, SurrealQL has http functions that allow you to directly get and insert JSON data from an API, all in one query.

UPSERT

The UPSERT statement is the final way you can insert data.

Like in other SQL dialects, the name UPSERT comes from being a combination of an UPDATE and an INSERT statement.

Now, we just mentioned that the INSERT can also UPDATE data, so you’re probably wondering what the difference is between them.

INSERT INTO product (id, name, colours) VALUES (1, 'Cruiser Hoodie', ['Dark Heather Grey','Bubble Gum Pink']); INSERT INTO product (id) VALUES (1) ON DUPLICATE KEY UPDATE colours += ['Purple'];

The way the INSERT statement does upserts would seem familiar to most SQL users, using ON DUPLICATE KEY UPDATE which tries to INSERT the data and if it sees there is already a record with that id, it does an update instead.

UPSERT product:ulid() SET colours += ['Purple']; UPSERT product:ulid() CONTENT {colours: ['Purple']};

The UPSERT statement, in general, has more advanced UPDATE functionality, which we’ll cover later in this part when we talk about updating data. But it is the opposite of the INSERT behaviour, where it tries to UPDATE the data and if a record with that id doesn’t exist, it does an insert instead.

Apart from that the UPSERT follows the same syntax as the CREATE statement, with both a SET and CONTENT option, as well as the data generation functionality.

Summary

We’ve covered a lot in this lesson! Let’s summarise the main points we learned, and if you skipped right to this part, welcome!

We learned that:

  • The CREATE statement

    • Creates tables and inserts data
    • Is the only statement that can create and insert into multiple tables at the same time
    • Can only insert one record at a time to the same table (except for data generation)
    • Uses both SET and CONTENT for SQL-like and JSON-like experience
  • The INSERT statement

    • Creates a table, inserts data and updates data
    • Is the only statement that can insert multiple records into the same table
    • Can only update one record at a time, tries to insert and if it fails, it does update.
    • Doesn’t use SET and CONTENT, but rather tuples like SQL and direct objects or a list of objects.
  • The UPSERT statement

    • Creates a table, inserts data and updates data
    • Is the only statement that can update an entire table
    • Tries update and if fails, does insert.
    • Can only insert a specific record (except for data generation)
    • Uses both SET and CONTENT for SQL-like and JSON-like experience

If you’d like to experiment with what you’ve learned, feel free to play around with the Surrealist query interface, you can also go back and edit or change any query you want.

When you’re ready, I’ll see you in the next one!