Surreal Cloud beta is LIVE! Get started for free.

Our new demo dataset has a lot in store for you!

Jun 13th, 2024

Our new demo dataset has a lot in store for you!

by Alexander Fridriksson

With our 2.0-alpha release, you’ll find a lot of exciting updates, one of which is a new and improved demo dataset!

If you’ve been using our Surreal Deal dataset, you’ll see some familiar tables and fields as this dataset is also based around an e-commerce store. However, unlike the previous Surreal Deal, this dataset is based on an actual e-commerce store!

That store is none other than our very own swag store - SurrealDB.store!

SurrealDB store

The real and the surreal

Before we dive into some exciting examples, it’s important to clarify what it means to be based on a real e-commerce store.

It means that

  • The products are real, and you can buy them on the SurrealDB.store!
  • Everything else is fake dummy data that has been generated from scratch with no connection to the actual real store.
  • There is nothing in this demo data that was not publicly available on SurrealDB.store at the time of this dataset’s creation.

Now that we have that out of the way, let’s explore the new dataset.

What’s in store

Surreal deal store schema

Define all the things!

There have been massive improvements and a lot of new features added since the dataset was first generated for 1.0-beta.9 last year.

The dataset has been completely overhauled to showcase many new features and more examples of data modelling patterns.

You’ll now find the schema definitions for all the tables and fields, including more advanced definitions like assertions and default values.

-- Define a field with the object data type DEFINE FIELD images ON TABLE product TYPE array<object>; -- Define the subfields of the string data type Assert that the URL is a valid URL DEFINE FIELD images.*.url ON TABLE product TYPE string ASSERT string::is::url($value); -- Define the subfields of type number DEFINE FIELD images.*.position ON TABLE product TYPE number; -- Define time field on product table type object with subfields: created_at and updated_at with the datetime data type DEFINE FIELD time ON TABLE product TYPE object; DEFINE FIELD time.created_at ON TABLE product TYPE datetime VALUE $before OR time::now() DEFAULT time::now(); DEFINE FIELD time.updated_at ON TABLE product TYPE datetime VALUE time::now() DEFAULT time::now();

You’ll find tables with record links in both directions.

-- from person to address_history DEFINE FIELD person ON TABLE address_history TYPE record<person>; -- from address_history to person DEFINE FIELD address_history ON TABLE person TYPE record<address_history>;

record links

As well as a graph relation example with just 2 tables instead of 3.

DEFINE TABLE product_sku TYPE RELATION FROM product TO product

graph

You might notice its using the new TYPE RELATION instead of defining the in and out fields.

There are also various indexes defined.

-- Unique indexes DEFINE INDEX unique_wishlist_relationships ON TABLE wishlist COLUMNS in, out UNIQUE; -- Index on nested fields or record links DEFINE INDEX person_country ON TABLE person COLUMNS address.country; -- Analyzer and index for full-text search DEFINE ANALYZER blank_snowball TOKENIZERS blank FILTERS lowercase, snowball(english); DEFINE INDEX review_content ON TABLE review COLUMNS review_text SEARCH ANALYZER blank_snowball BM25 HIGHLIGHTS;

As well as functions

-- A function can encapsulate any valid SurrealQL logic DEFINE FUNCTION fn::pound_to_usd($price: number) { RETURN $price * 1.26f; }; -- Which means it can also be used like stored procedure DEFINE FUNCTION fn::number_of_unfulfilled_orders() { RETURN ( SELECT count() FROM order WHERE order_status NOTINSIDE ['processed', 'shipped'] GROUP ALL ); };

define all the things

Time sortable random ids, using ULIDs

While our random ids are a great default option, they aren’t the only option.

For an e-commerce application where things are often based on time, it can make sense to have a time sortable id, as was the case for Shopify switching from UUID v4 to ULID.

SurrealDB also supports UUID v7 , which does pretty much the same thing. The honest reason ULID was chosen here is just because its shorter and looks better.

Anyway, these identifiers naturally sort your data based on creation time, which enables you to better use our range query pattern for highly efficient operations regardless of table size.

-- Select the sum of sales for each product in the order table SELECT product_name, math::sum(price * quantity) as sum_sales FROM order:01FS426M489J4RVK2AGSN9HVP0..01FTQB6ZMG9RMB9300Z6GBDXNR GROUP BY product_name;

Realistic product reviews generated with llama3

It has been a fun experiment seeing what it takes to use a local LLM for fake data generation.

Now our fake reviews are no longer just a collection of words chosen at random.

review_text: "agriculture artwork feed name along xhtml putting photos much meal costs spring"

They are a collection of semi-random words chosen with statistics!

review_text: "The Voyager Wool-Like Jacket has become my new go-to for cold winter days. It's that good!",

As expected, it was relatively quick and easy to write prompts and get some outputs, you’ve been doing this with ChatGPT and friends for a while now.

The hard part was actually integrating it into a system and building the guardrails around it which makes it safe, useful, predictable and repeatable.

That is why we’re building out various AI/ML workflows here at SurrealDB which make it easier for you to build intelligent applications.

What this means for this dataset however is that it allows us to have better examples such as this full-text search example looking for the products which people mention they are wearing nonstop.

-- Select product name and review text where the review text contains the phrase "wearing nonstop" SELECT ->product.name, review_text FROM review WHERE review_text @@ 'wearing nonstop';

What’s next?

A lot of thought has been put into to making the dataset more realistic and useful for learning the various features and patterns everyone should be aware of. This is however just the beginning.

As we get closer to the finalised 2.0 release, the dataset will be updated with more features and improvements. As well as more examples of how to use it.

What you can do now is:

If you think something is missing or could be done better, let us know! We would love to hear from you.