We now have a large number of movie records that are in a format that lets us use SurrealDB's strengths when querying them, along with a strict schema for the single movie table.
We haven't made the best use of the actors, writers and directors fields yet because they are still just arrays of strings.
It's now time to change these into their own person records. Now, a person can be a director, writer, or actor, or some combination of the three. When this is the case, we can add the string "actor", "director", or "writer" to a field called roles. Here we will add a clause VALUE $value.distinct() clause. This means that the value for this field whenever it is created or updated will be calculated by calling array::distinct() on itself, which will remove any duplicate values.
So if we were to update a persons roles with a duplicate value, it would be like making this query:
-- Add an extra 'director' to the array... ['actor', 'director', 'director'].distinct();
-- Output: no extra 'director'! ['actor', 'director']
At the same time, for every movie we will want to create a graph edge between the person and the movie in question. Graph edges have a name too, and usually it is a verb. Let's define these ahead of time too. By using the TYPE RELATION and FROM and TO fields we can ensure that these tables can only be used as graph edges that link one record type (a person) to another one (a movie).
By the way, all the schema definitions have now made it possible for Surrealist to represent the schema in a visual form. If click on the Designer View button a few steps down from the current Query View, you should see this! The person and movie tables are connected via three graph tables, while naive_movie without any definitions is all on its own.
Here's how we can insert this data:
Inside a SELECT statement for each movie, use a for loop for each of the movie's actors, writers, and directors.
There might already be a person record that matches the name. If it doesn't exist, we can just create it and give it the role.
If the person record already exists, we will use an UPDATE statement and SET its roles field with a += "actor", or += "writer", or += "director". Thanks to the .distinct() part of the field definition we don't need to worry about whether the roles field already contains this value.
Here is part of the query with comments on each line to explain how the logic works.
// Go through each movie FOR$movieINSELECT * FROMmovie{ // Look for the actor names FOR$actor_nameIN$movie.actors { // Try to get an existing record LET$actor = (SELECT * FROMONLYpersonWHEREname=$actor_nameLIMIT1); // If it doesn't exist, create it and return that LET$actor = IF$actorISNONE{ CREATEpersonCONTENT{name: $actor_name, roles: ["actor"]} }ELSE{ // If it does exist, update the role UPDATE$actorSETroles+="actor"; $actor }; // And relate it to the movie RELATE$actor->starred_in->$movie;
The final RELATE statement is how you create a graph edge to join one record to another. Inside our query we are using parameters, but since the parameters represent a record ID the actual query is this sort of form.
RELATEactor:one->starred_in->movie:one;
Here is the whole query to paste in to create all the people involved in each movie and their relation to it.
With these relations set up, we can have some fun with graph queries! For more information on how to use graph queries, see this page.
However, graph queries use the same arrow syntax that is used in the RELATE statements above, so they are easy enough to follow along when explained well enough.
To start, let's see if we can get the name and movies starred in for two people. This query technically works except that the movies only show up as random IDs which is not readable.
For our final graph query, let's see the names and roles (all the roles in which they work for any movie) for all the people in two movies. Here we can change the first part of the graph query to ? which means to match on any graph table. Or you could write (starred_in, wrote, directed) to ensure that only three types of graph tables will be queried.
SELECTtitle, // All graph tables <-?<-person.{name, roles}ASparticipants FROMmovieLIMIT2;
SELECTtitle, // Three types of graph tables <-(starred_in, wrote, directed)<-person.{name, roles}ASparticipants FROMmovieLIMIT2;