Multi-model RAG with SurrealDB & LangChain
Watch on-demand
1: Getting started1: Getting started
Surrealist Placeholder
Surrealist Placeholder
Back to Courses

1: Getting started

Working with JSON

SurrealDB is a multi-model database, meaning that it has a lot of models built in that you would otherwise need extra databases to use. One of these models is that of a document database: a database that works with documents in formats like JSON.

There are a lot of datasets available on the internet for movies, and they usually come in JSON format. Here is an example of one of the highest rated movies in history in JSON format:

{ "Title": "Terminator 2: Judgment Day", "Rated": "R", "Released": "03 Jul 1991", "Runtime": "137 min", "Genre": "Action, Sci-Fi", "Director": "James Cameron", "Writer": "James Cameron, William Wisher", "Actors": "Arnold Schwarzenegger, Linda Hamilton, Edward Furlong", "Plot": "A cyborg, identical to the one who failed to kill Sarah Connor, must now protect her ten-year-old son John from a more advanced and powerful cyborg.", "Language": "English, Spanish", "Country": "United States", "Awards": "Won 4 Oscars. 36 wins & 33 nominations total", "Poster": "https://m.media-amazon.com/images/M/MV5BMGU2NzRmZjUtOGUxYS00ZjdjLWEwZWItY2NlM2JhNjkxNTFmXkEyXkFqcGdeQXVyNjU0OTQ0OTY@._V1_SX300.jpg", "Ratings": [ { "Source": "Internet Movie Database", "Score": "8.6/10" }, { "Source": "Rotten Tomatoes", "Score": "93%" }, { "Source": "Metacritic", "Score": "75/100" } ], "DVD": "13 Feb 2007", "BoxOffice": "$205,881,154", }

If you paste this into Surrealist and hit enter, you’ll see the following output. This is SurrealQL, which looks pretty similar to JSON. You can also choose to format output in Surrealist in classic JSON, in which case it will look exactly like the input above.

{ Actors: 'Arnold Schwarzenegger, Linda Hamilton, Edward Furlong', Awards: 'Won 4 Oscars. 36 wins & 33 nominations total', BoxOffice: '$205,881,154', Country: 'United States', DVD: '13 Feb 2007', Director: 'James Cameron', Genre: 'Action, Sci-Fi', Language: 'English, Spanish', Plot: 'A cyborg, identical to the one who failed to kill Sarah Connor, must now protect her ten-year-old son John from a more advanced and powerful cyborg.', Poster: 'https://m.media-amazon.com/images/M/MV5BMGU2NzRmZjUtOGUxYS00ZjdjLWEwZWItY2NlM2JhNjkxNTFmXkEyXkFqcGdeQXVyNjU0OTQ0OTY@._V1_SX300.jpg', Rated: 'R', Ratings: [ { Score: '8.6/10', Source: 'Internet Movie Database' }, { Score: '93%', Source: 'Rotten Tomatoes' }, { Score: '75/100', Source: 'Metacritic' } ], Released: '03 Jul 1991', Runtime: '137 min', Title: 'Terminator 2: Judgment Day', Writer: 'James Cameron, William Wisher' }

To insert this into the database, add CREATE movie CONTENT before the object. You’ll see the same output, except for two differences. See if you can spot them.

[ { Actors: 'Arnold Schwarzenegger, Linda Hamilton, Edward Furlong', Awards: 'Won 4 Oscars. 36 wins & 33 nominations total', BoxOffice: '$205,881,154', Country: 'United States', DVD: '13 Feb 2007', Director: 'James Cameron', Genre: 'Action, Sci-Fi', Language: 'English, Spanish', Plot: 'A cyborg, identical to the one who failed to kill Sarah Connor, must now protect her ten-year-old son John from a more advanced and powerful cyborg.', Poster: 'https://m.media-amazon.com/images/M/MV5BMGU2NzRmZjUtOGUxYS00ZjdjLWEwZWItY2NlM2JhNjkxNTFmXkEyXkFqcGdeQXVyNjU0OTQ0OTY@._V1_SX300.jpg', Rated: 'R', Ratings: [ { Score: '8.6/10', Source: 'Internet Movie Database' }, { Score: '93%', Source: 'Rotten Tomatoes' }, { Score: '75/100', Source: 'Metacritic' } ], Released: '03 Jul 1991', Runtime: '137 min', Title: 'Terminator 2: Judgment Day', Writer: 'James Cameron, William Wisher', id: movie:6emff7wffymreak1tjev } ]

The first difference is that the output is now an array of objects, which is the default for statements like INSERT . The second difference is that it has a new field called id .

{ // other fields... id: movie:6emff7wffymreak1tjev }

Every record stored in SurrealDB has an id field. You can set the value of an id yourself, such as movie:one or movie:1 , a complex ID like movie:["sci-fi", "Terminator 2"] , or even other languages and emojis like 𓀵 :👋 if enclosed by backticks! Anything that can be encoded as UTF-8 can be used as a record ID.

If you don’t specify the id field, it will be composed of the table name and a 20-digit random string as shown above.

Now let’s type REMOVE TABLE movie to remove the entire movie table along with the movie record that we just created. We’ll recreate it later after we decide what to do with the original data.

The original movie data

There is a .surql file here that inserts all of these movies into the database under the table name naive_movie . To create a lot of records at the same time, we can use INSERT INTO . An INSERT statement is similar to CREATE but it is able to take an array of objects.

You’ll also notice that the end of the statement has RETURN NONE . This is just a convenience to show no output after the statement. Otherwise we would be the 100+ records inserted, which is a lot of output to scroll through.

INSERT INTO naive_movie [ { Actors: 'Tim Robbins, Morgan Freeman, Bob Gunton', Awards: 'Nominated for 7 Oscars. 21 wins & 43 nominations total', BoxOffice: '$28,767,189', Country: 'United States', DVD: '21 Dec 1999', }, // and so on for 100+ more ] RETURN NONE;

We are giving it the table name naive_movie because, though this data is readable, it is entirely composed of string data which is not always all that useful.

Let’s use a SELECT statement to try to order the movies by runtime length. The VALUE clause here tells the database to only return the Runtime values on their own, instead of as a field inside a record.

SELECT VALUE Runtime FROM naive_movie ORDER BY Runtime;

Look at this horrible ordering! Because strings are ordered per character by their order on the Unicode list of characters, the “shortest” movie comes in at 102 minutes - because 1 is the lowest first character, 0 is the lowest second character, and 2 is the lowest third character.

Meanwhile, the “longest” movie comes in at 99 minutes long because 9 is the longest first character and 9 is also the longest second character.

Output
['102 min', '102 min', '102 min', '103 min', '103 min', '103 min', /* skip about 100 movies... */ '201 min', '202 min', '207 min', '218 min', '229 min', '68 min', '81 min', '87 min', '87 min', '88 min', '88 min', '89 min', '89 min', '91 min', '95 min', '96 min', '96 min', '97 min', '98 min', '99 min', '99 min']

But if these fields were durations, the output would be properly ordered in this way instead.

[103m, 93m, 105m].sort();
Output
[ 1h33m, 1h43m, 1h45m ]

SurrealQL has a lot of data types that we can convert this naive data to in order to benefit from the advantages of strong typing. We’ll take a look at them in the next page.

Trusted & Certified for Enterprise Security Learn more
SOC 2 Type 2
GDPR
Cyber Essentials Plus
ISO 27001