This course requires authentication.
Please sign in to continue
While exploring the magic of record IDs, we used the CREATE
statement to create tables and insert the various IDs.
However, we didn’t go into what the CREATE
statement really is or how to use it properly, so I’m sure you have questions about that.
You’ll get the answers to all your questions in this lesson, but if not, let me know! We’ll also cover the INSERT
and UPSERT
statements. We’ll explore how they’re different and when it makes sense to use them. That should cover everything you need to know about inserting data in a schemaless way, with the exception of relationships, which we’ll cover in part 2.
Since we’re covering everything you need to know, it’s going to be a lot. If you want the short version (TLDR), feel free to skip the summary section.
As we saw in the previous lesson, CREATE table
does indeed create a table. However, I wanted to start out with the surprise that our CREATE
statement is for inserting data and not defining a schema, like in most SQL dialects. That is what DEFINE TABLE
is for, as we’ll see in part 3.
Now that we’ve cleared that up let’s start by creating the person
and product
tables.
CREATE person; CREATE product; CREATE person, product PARALLEL;
You can either create them individually, or you can create them in parallel. The choice is yours depending on what you need.
The important thing to know, to help you decide between these, is that each statement is executed in its own transaction.
Therefore CREATE person;
then CREATE product;
, would be 2 transactions, whereas CREATE person, product PARALLEL;
would be one transaction.
What is a Transaction
A transaction in a database is a sequence of operations performed as a single unit of work to ensure data integrity and consistency.
It adheres to the ACID properties:
Transactions ensure reliable and consistent data management in multi-user and distributed environments.
This is meant to give you more direct control over how your queries should behave and reduce the need for you do this, somewhat complicated logic, in your application code.
CREATE person TIMEOUT 2ms; CREATE product TIMEOUT 2ms; CREATE person, product TIMEOUT 3ms PARALLEL;
The same goes for the TIMEOUT
clause, which can be used to specify the maximum time the statement should take to execute. Such as when you want to tightly control compute costs or make sure queries succeed or fail within tight latency boundaries. With this we avoid a big query queue forming.
Most of the time you probably don’t need this though. However, it’s important to know you can do this when you need to.
Now, let’s finally insert more than Record IDs! The CREATE
statement has two ways of doing this.
You can use the SET
clause like we saw earlier, which is more SQL-like. But you can also use the CONTENT
clause which is more JSON-like.
Let’s do both so we can compare like for like, starting with the CONTENT
clause.
CREATE product CONTENT { name: 'Cruiser Hoodie', details: [ "Medium fit", "Set-in sleeve"], colours: ['Dark Heather Grey','Bubble Gum Pink'], sizes: [ "xs", "s", "m", "l", "xl"], price: 59, images: [ { url: "https://surrealdb.store/cdn/shop/files/cruiser-hoodie-bubble-gum-pink-white-m-1_65f13f38-058c-4e29-b2fa-aad068da75e0.jpg", position: 1 }, { url: "https://surrealdb.store/cdn/shop/files/cruiser-hoodie-dark-heather-grey-pink-icon-m-1.jpg?v=1690812164", position: 2 } ], time: { created_at: time::now(), updated_at: time::now() } };
In case you’re curious, the product we are creating here is the actual hoodie I’m wearing now, which you can order at our real store: SurrealDB.store.
But back to the CONTENT
, you’ll notice that this is a denormalised schema we are creating here, which means we are including all the relevant information about the product in one table.
We could normalise it by putting details
, colours
, sizes
and images
in their own tables. That way, we could have less duplication of data as many products share the same colours
and sizes
for example. We would however then need to join together 5 tables if we’d want the entire product information, which is massively more compute-intensive than just reading the data from one table.
Now if you’re thinking, when would I ever do that? You might be surprised to hear that it’s more often than you think, as you’d need all that data for every product order page.
We have however not completely denormalised it as you’ll see in part 2 when we bring in the product_sku
table, which shows how many items you have left in stock for each product sku.
A product sku being the combination of the specific colour and size.
More on that later though, let’s now show how this CONTENT
example would look using the SET
clause.
CREATE product SET name = 'Cruiser Hoodie', colours = ['Dark Heather Grey','Bubble Gum Pink'], price = 59, time = { created_at: time::now(), updated_at: time::now() };
You may have noticed it’s not that different, mostly just replacing the semi-colon :
with the equals sign =
You can however break it down further by specifying the object contents separately with the dot notation.
CREATE product SET name = 'Cruiser Hoodie', time.created_at = time::now(), time.updated_at = time::now();
If you’re wondering which method is better, CONTENT
or SET
, I would say that neither is better, it’s more a matter of preference.
My general preference would be using SET
if I’m just inserting one or two fields, but using CONTENT
if I’m inserting a lot of fields, or inserting JSON data straight from the web, or some other system where I receive the data in a JSON format.
The most surreal thing about the CREATE
statement however is that it’s great for data generation using the pipe syntax. Where we put the pipe character on either side of the table name and specify the number of records we want to generate, as if we were creating a record ID with that number. Since we specified 100 here, we are therefore creating a 100 product records.
CREATE |product:100| CONTENT { name: rand::enum('Cruiser Hoodie', 'Surreal T-Shirt'), colours: [rand::string(10), rand::string(10),], price: rand::int(20, 60), time: { created_at: rand::time(1611847404, 1706455404), updated_at: rand::time(1651155804, 1716906204) } };
Then instead of real data, we can use the random functions to just generate some dummy data for us. We can use rand::string
for strings, rand::int
for our price, rand::time
for time and rand::enum
for randomly selecting between the options we give it. You can find more random functions in the documentation.
This is super useful for rapid prototyping of different table structures, to see what works best.
That’s all we need to know for now about the CREATE
statement, let’s therefore move over to the INSERT
statement.
The INSERT
statement is more similar to what you’d expect from most SQL dialects.
You can insert a single record by using the typical tuple syntax, but with native support for arrays and objects, which I really like.
INSERT INTO product (name, colours, price, time) VALUES ( 'Cruiser Hoodie', ['Dark Heather Grey','Bubble Gum Pink'], 59, {created_at: time::now(), updated_at: time::now()} );
The same goes for inserting multiple records.
INSERT INTO product (name, colours, price, time) VALUES ('Cruiser Hoodie', ['Dark Heather Grey','Bubble Gum Pink'], 59, {created_at: time::now(), updated_at: time::now()}), ('Surreal T-Shirt', ['White Pink','Purple White'], 25, {created_at: time::now(), updated_at: time::now()});
Now when it comes to copying data from one table to the another table, it looks the same as in most SQL dialects.
INSERT INTO product_copy (SELECT * FROM product);
We do this by INSERT INTO
the product_copy
table by selecting everything from the product table.
There are two important things to point out here though
INSERT
statement doesn’t just INSERT
data into tables you already created. It also creates a table if it doesn’t already exist, exactly like we just did with the product_copy
table (but not when running in strict mode).Depending on the size of your table, this could be prohibitively compute intensive, therefore it’s very important to think carefully about your Record IDs.
Now we get to the part where the INSERT
statement becomes more Surreal, as it also accepts JSON-like objects.
INSERT INTO product { name: 'Cruiser Hoodie', colours: ['Dark Heather Grey','Bubble Gum Pink'], price: 59, time: { created_at: time::now(), updated_at: time::now() } };
Let’s insert a single record into our table.
Unlike the CREATE
statement, INSERT
does not use the CONTENT
keyword after the table name. Adding it here would cause an error, just like forgetting CONTENT
in CREATE
would.
The reason for this is because INSERT
doesn’t have both SET
and CONTENT
, It doesn’t need to make this distinction. You can change this query and try it out if you want to see the error.
For inserting multiple records, the only difference is that instead of inserting an object, we insert a list of objects.
INSERT INTO product [ { name: 'Cruiser Hoodie', colours: ['Dark Heather Grey','Bubble Gum Pink'] }, { name: 'Surreal T-Shirt', colours: ['White Pink','Purple White'] } ];
This is a very convenient way to bulk insert data, especially if you’re fetching the data from an API.
INSERT INTO pokemon http::get('https://pokeapi.co/api/v2/pokemon/pikachu/');
I personally use a language like Python to fetch data from APIs and insert it into SurrealDB. However, as we can see from this Pokemon API example, SurrealQL has http
functions that allow you to directly get and insert JSON data from an API, all in one query.
The UPSERT
statement is the final way you can insert data.
Like in other SQL dialects, the name UPSERT
comes from being a combination of an UPDATE
and an INSERT
statement.
Now, we just mentioned that the INSERT
can also UPDATE
data, so you’re probably wondering what the difference is between them.
INSERT INTO product (id, name, colours) VALUES (1, 'Cruiser Hoodie', ['Dark Heather Grey','Bubble Gum Pink']); INSERT INTO product (id) VALUES (1) ON DUPLICATE KEY UPDATE colours += ['Purple'];
The way the INSERT
statement does upserts would seem familiar to most SQL users, using ON DUPLICATE KEY UPDATE
which tries to INSERT
the data and if it sees there is already a record with that id, it does an update instead.
UPSERT product:ulid() SET colours += ['Purple']; UPSERT product:ulid() CONTENT {colours: ['Purple']};
The UPSERT
statement, in general, has more advanced UPDATE
functionality, which we’ll cover later in this part when we talk about updating data. But it is the opposite of the INSERT
behaviour, where it tries to UPDATE
the data and if a record with that id doesn’t exist, it does an insert instead.
Apart from that the UPSERT
follows the same syntax as the CREATE
statement, with both a SET
and CONTENT
option, as well as the data generation functionality.
We’ve covered a lot in this lesson! Let’s summarise the main points we learned, and if you skipped right to this part, welcome!
We learned that:
The CREATE
statement
SET
and CONTENT
for SQL-like and JSON-like experienceThe INSERT
statement
SET
and CONTENT
, but rather tuples like SQL and direct objects or a list of objects.The UPSERT
statement
SET
and CONTENT
for SQL-like and JSON-like experienceIf you’d like to experiment with what you’ve learned, feel free to play around with the Surrealist query interface, you can also go back and edit or change any query you want.
When you’re ready, I’ll see you in the next one!