With our 2.0-alpha release, you'll find a lot of exciting updates, one of which is a new and improved demo dataset!
If you've been using our Surreal Deal dataset, you'll see some familiar tables and fields as this dataset is also based around an e-commerce store. However, unlike the previous Surreal Deal, this dataset is based on an actual e-commerce store!
That store is none other than our very own swag store - SurrealDB.store!
The real and the surreal
Before we dive into some exciting examples, it's important to clarify what it means to be based on a real e-commerce store.
It means that
The products are real, and you can buy them on the SurrealDB.store!
Everything else is fake dummy data that has been generated from scratch with no connection to the actual real store.
There is nothing in this demo data that was not publicly available on SurrealDB.store at the time of this dataset's creation.
Now that we have that out of the way, let's explore the new dataset.
What's in store

Define all the things!
There have been massive improvements and a lot of new features added since the dataset was first generated for 1.0-beta.9 last year.
The dataset has been completely overhauled to showcase many new features and more examples of data modelling patterns.
You'll now find the schema definitions for all the tables and fields, including more advanced definitions like assertions and default values.
You'll find tables with record links in both directions.

As well as a graph relation example with just 2 tables instead of 3.

You might notice it's using the new TYPE RELATION instead of defining the in and out fields.
There are also various indexes defined.
As well as functions

Time sortable random ids, using ULIDs
While our random ids are a great default option, they aren't the only option.
For an e-commerce application where things are often based on time, it can make sense to have a time sortable id, as was the case for Shopify switching from UUID v4 to ULID.
SurrealDB also supports UUID v7, which does pretty much the same thing. The honest reason ULID was chosen here is just because it's shorter and looks better.
Anyway, these identifiers naturally sort your data based on creation time, which enables you to better use our range query pattern for highly efficient operations regardless of table size.
Realistic product reviews generated with llama3
It has been a fun experiment seeing what it takes to use a local LLM for fake data generation.
Now our fake reviews are no longer just a collection of words chosen at random.
They are a collection of semi-random words chosen with statistics!
As expected, it was relatively quick and easy to write prompts and get some outputs, you've been doing this with ChatGPT and friends for a while now.
The hard part was actually integrating it into a system and building the guardrails around it which makes it safe, useful, predictable and repeatable.
That is why we're building out various AI/ML workflows here at SurrealDB which make it easier for you to build intelligent applications.
What this means for this dataset however is that it allows us to have better examples such as this full-text search example looking for the products which people mention they are wearing nonstop.
What's next?
A lot of thought has been put into to making the dataset more realistic and useful for learning the various features and patterns everyone should be aware of. This is however just the beginning.
As we get closer to the finalised 2.0 release, the dataset will be updated with more features and improvements. As well as more examples of how to use it.
What you can do now is:
If you think something is missing or could be done better, let us know! We would love to hear from you.
