What is time?
The philosopher Heidegger described Hegel's "vulgar understanding" of time as simply a sequence of “nows”.
This "vulgar understanding" also happens to be a great way to understand time series data. We are simply capturing this sequence of "nows" such that we can analyse it later.
How now is now?
Dealing with time series data forces you to think of these rather philosophical questions because, for most things in life, now usually means right this second.
However, for high-frequency time series data, it's all about milliseconds, microseconds and nanoseconds.
In SurrealDB, when we ask for the current time using the time::now function, we get back an ISO8601 timestamp with nanosecond precision in the UTC timezone.
Fun fact, UTC is not Universal Time Coordinated, but rather Coordinated Universal Time (CUT), but since the French acronym is TUC (Temps Universel Coordonné), a compromise was made to have UTC as the international acronym.
The two types of time series data
As we've seen, a single now is called a timestamp and a sequence of "nows" is called a time series. However, not every time series is the same.
There are fundamentally two types of time series data: events (discrete data) and metrics (continuous data).
Events
Events are timestamped records collected whenever they happen. An example of this is when starting a new SurrealDB server using surreal start, after which you’ll be greeted with log messages like these.
Events are both:
Discrete: meaning each of the log messages above is a separate part of the whole
surreal startevent.Irregular: meaning that when we trigger the
surreal startevent, we get the series of log messages above. If we do nothing after that we won’t get any more logs until we send a stop event using the keyboard shortcut ctrl-c.
However, if we have our logging set to debug surreal start --log debug and then run a query, that will trigger an event and we’ll get more timestamped records.
Since there can be millions of queries per second and each query can generate multiple log messages, you quickly start to see how large the scale of time series data can be.
Metrics
Metrics are timestamped records collected over regular intervals of time. If you use any kind of fitness tracker, such as the Apple Watch, you’ll have seen metrics in action.
Metrics are both:
Continuous: meaning each heart rate measurement is part of tracking the same heart.
Regular: meaning you collect and summarise measurements at regular intervals instead of all the time, such that your fitness tracker’s battery and storage lasts longer. Our example summarises our heart rate to beats per minute (bpm).
To keep the series continuous without tracking every heartbeat, the fitness tracker uses fancy math to infer your past heart rate to fill in the gaps. If your current heart rate is 71 bpm and your heart rate was 72 bpm 2m ago then your heart rate 1m ago would be between 71 and 72 bpm. We can use a bit of fancy math like linear interpolation to find out the answer is most likely 71.5 bpm.
Its also possible to turn discrete events into continuous metrics, usually through counting the events over a regular interval. We can for example turn discrete query logs into a useful metric such as queries per second (qps) to keep track of the database performance.
Three key reasons for turning events into metrics are:
Turning data into insight. A series of query logs is data, queries per second offers insight.
Downsampling to save on storage space and computation cost, only keeping the summarised metrics (beats per minute or queries per second) if the lowest granularity of the data is not needed.
Enabling forecasting, since most forecasting models only work with continuous and regular data.
Modelling time series data in SurrealDB
Now that we’ve established a common understanding of time series data, let’s explore a practical example using IoT sensor data.
Modelling events
The normal way of modelling data in SurrealDB would be as fields in a record.
Complex record IDs
A more optimised way of working with it in a time series context would however be using array-based record IDs, otherwise known as complex record IDs.
Selecting from array-based record IDs
The example of storing it as a record or as part of the ID might look similar at first glance, but under the hood, it's optimised for efficient range selection, through the magic of record IDs.
This effectively means fewer worries about selecting the right indexes or partitions since the ID field already does that naturally in your data as you scale with the performance of a key-value lookup regardless of size!
This is however not the only way of doing it, you can have the metadata in the ID and sensor data in the record itself, like in the example below.
IDs inside IDs
The last thing to note here is that we’ve actually been using record IDs inside our complex record IDs! This is to reduce the fields in our main time series table to only the necessary ones by offloading most of the metadata to connected tables.
In our case we have
location:Longleat_House, which refers to theLongleat_HouseID in thelocationtable. There we put all the metadata about the location itself such as geo coordinates.sensor:ARF8394AAA, which refers to theARF8394AAAID on thesensortable. There we could put all the metadata about the sensor such as location, firmware, when it was bought and when it needs maintenance.
It's very easy and performant to get the connected data, since you don’t have to do any table scans for that either since it links directly to a specific record on a specific table!
Now that we’ve explored a bit how to store and query event data, let’s turn our events into metrics.
Modelling metrics
For doing metrics in SurrealDB you can choose one or combine
Pre-computed table views and Live Queries
Drop tables
Custom events
Pre-computed table views
Our pre-computed table views are most similar to event-based, incrementally updating, materialised views. Practically, this means our downsampled metrics will always be up to date as it incrementally updates in near real-time when we add more records to the sensor_readings table.
For real-time visualisation of our metrics we can then use Live Queries to stream real-time updates to our client, such as a BI dashboard or embedded analytics code.
Drop tables
Drop tables are pretty unique tables that drop all writes once they have been written.
These tables can be very useful in a time series context if you want to capture very high-frequency data but only care about storing the aggregated downsampled metrics. They are typically used in combination with either the table views or custom events, such that the metrics are calculated then the underlying data is automatically dropped.
When combining drop tables, table views and live queries, you have a very easy-to-set up, event-based and real-time solution from capturing events, creating metrics, dropping stale data and live selects for visualisation.
Custom events
If you have something even more bespoke in mind, you can even create your own event triggers based on when a record is created, updated or deleted. You can include any valid SurrealQL inside the event.
For example, we can create a simple real-time anomaly detection and notification solution using just SurrealQL events and functions in 5 steps.
Define a event to trigger when a record is added to the
sensor_readingstable.Get the desired time range you want to track.
Calculate both the upper and lower threshold for an outlier, using the standard
Q1 - 1.5 * IQRformula for the low outliers andQ3 + 1.5 * IQRformula for the high outliers.Check if the current temperature is a low or high outlier.
Send a
http::postrequest with the outlier details.
SurrealDB vs specialised time series databases
There are many specialised time series databases out there, so where does SurrealDB fit in?
The advantages SurrealDB has over specialised time series databases are:
That you can combine our time series functionality with the rest of our multi-model database features. For example, doing full-text search and vector search on your log data.
No need to learn another query language just for time series. SurrealDB has a unified query language for all its features.
Connect and enrich your metrics easily, instead of having them being siloed in a separate system. You can have all your data in one place with zero ETL for your various use cases. Whether you’re doing transactional, analytical, ML and AI applications, SurrealDB covers a lot of the use cases a modern application needs.
SurrealDB will soon also become a native bi-temporal database, with the introduction of new data structure for our new SurrealKV storage engine.
The advantages specialised time series databases have over SurrealDB currently are:
More advanced time series features such as custom data retention policies and better data compression.
Whether you pick SurrealDB for your time series use cases depends mostly on whether you are looking to lower your total system complexity or if you are looking for another specialised solution.
If you want to reduce your total system complexity, check out Surreal Cloud for the power and flexibility of SurrealDB without the pain of managing infrastructure.
