A time series database is optimized for storing, querying, and managing time-stamped data. Rather than treating the timestamp as just another attribute, TSDBs place time at the forefront of data modeling, indexing, retention, and query optimization. The queries in time series workloads often include:
Finding or aggregating values over time windows (e.g., last 5 minutes, last 30 days). Detecting patterns or anomalies (peaks, troughs, rolling averages). Combining real-time and historical data for a holistic view of a system’s behavior. Although several specialized TSDB solutions exist (e.g., InfluxDB, TimescaleDB), many multi-model databases—SurrealDB included—are introducing features to handle time series workloads efficiently. This guide will help you understand how to think like a time series database user, and how SurrealDB can accommodate these needs.
Timestamp Every record includes at least one timestamp field, typically indicating when the data was created, measured, or observed.
Measurement / Metrics Time series data often revolves around measurements—like temperature, CPU usage, stock prices. Each measurement can have multiple fields or tags.
Tags (or Labels) Extra dimensions that categorize data. For instance, a sensor reading might have a location tag or a device_id. These tags help you filter or group data.
Aggregations Because time series data can be massive (think thousands of data points per second), it’s common to store or generate aggregated values (e.g., hourly averages or daily sums) for faster analytics.
Retention Policies Time series datasets can grow exponentially. Many TSDBs support retention policies that drop or downsample older data while keeping recent data at high granularity.
Time Buckets It’s typical to group data into buckets (e.g., 1-minute, 5-minute, or 1-day intervals) for analytics.
Efficient Ingestion TSDBs are optimized for high write throughput, as time series data often arrives in large bursts or continuous streams.
Fast Time-Based Queries Data structures are specialized for retrieving data over specific time windows or intervals.
Built-In Aggregation and Downsampling Many TSDBs offer native functions to summarize data by minute, hour, or other intervals.
Scalability Time partitioning or sharding strategies make it easier to handle large volumes of time series data.
Retention Control You can automatically expire older data or keep only aggregated versions to save storage.
Now that we’ve established a common understanding of time series data, let’s explore a practical example using IoT sensor data.
The normal way of modelling data in SurrealDB would be as fields in a record.
CREATE sensor_readings CONTENT { timestamp: time::now(), location: location:Longleat_House, sensor: sensor:ARF8394AAA, temperture_celsius: 28.4, humidity_percent: 55 };
A more optimised way of working with it in a time series context would however be using either object-based or array-based record IDs, otherwise known as complex record IDs.
-- Object-based record IDs CREATE sensor_readings:{ timestamp: time::now(), location: location:Longleat_House, sensor: sensor:ARF8394AAA, temperture_celsius: 28.4, humidity_percent: 55 };
-- Array-based record IDs CREATE sensor_readings:[ time::now(), location:Longleat_House, sensor:ARF8394AAA, 28.4, 55 ];
The example of storing it as a record or as part of the ID might look similar at first glance, but under the hood, it’s optimised for efficient range selection, through the magic of record IDs.
This effectively means fewer worries about selecting the right indexes or partitions since the ID field already does that naturally in your data as you scale with the performance of a key-value lookup regardless of size!
-- Select all the temperature readings from the start until now -- from a specific sensor SELECT id[2] AS temperture_celsius FROM sensor_readings:[ NONE, sensor:ARF8394AAA ]..=[ time::now(), sensor:ARF8394AAA ];
This is however not the only way of doing it, you can have the metadata in the ID and sensor data in the record itself, like in the example below.
CREATE sensor_readings:[time::now(), location:Longleat_House,sensor:ARF8394AAA] CONTENT { temperture_celsius: 28.4, humidity_percent: 55 }; -- Select all the temperature readings from the start until now -- from a specific sensor SELECT temperture_celsius FROM sensor_readings:[ NONE, sensor:ARF8394AAA, ]..=[ time::now(), sensor:ARF8394AAA, ];
The last thing to note here is that we’ve actually been using record IDs inside our complex record IDs! This is to reduce the fields in our main time series table to only the necessary ones by offloading most of the metadata to connected tables.
In our case we have
location:Longleat_House
, which refers to the Longleat_House
ID in the location
table. There we put all the metadata about the location itself such as geo coordinates.sensor:ARF8394AAA
, which refers to the ARF8394AAA
ID on the sensor
table. There we could put all the metadata about the sensor such as location, firmware, when it was bought and when it needs maintenance.It’s very easy and performant to get the connected data, since you don’t have to do any table scans for that either since it links directly to a specific record on a specific table!
-- Select all fields in the ID and the coordinates field from the sensor table SELECT id, id[2].coordinates AS sensor_coordinates FROM sensor_readings:[ '2024-08-13T03:32:19.109Z', location:Longleat_House, sensor:ARF8394AAA, 28.4, 55 ];
Now that we’ve explored a bit how to store and query event data, let’s turn our events into metrics.
###Modelling metrics For doing metrics in SurrealDB you can choose one or combine
Our pre-computed table views are most similar to event-based, incrementally updating, materialised views. Practically, this means our downsampled metrics will always be up to date as it incrementally updates in near real-time when we add more records to the sensor_readings table.
-- Define a table view which aggregates data from the sensor_readings table DEFINE TABLE daily_measurements_by_location AS SELECT id[1] AS location, time::day(id[0]) AS day, math::mean(id[3]) AS avg_temperture_celsius, math::mean(id[4]) AS avg_humidity_percent FROM sensor_readings GROUP BY id[1];
For real-time visualisation of our metrics we can then use Live Queries to stream real-time updates to our client, such as a BI dashboard or embedded analytics code.
LIVE SELECT * FROM daily_measurements_by_location;
Drop tables are pretty unique tables that drop all writes once they have been written.
-- Drop all writes to the sensor_readings table. We don't need every reading. DEFINE TABLE sensor_readings DROP;
These tables can be very useful in a time series context if you want to capture very high-frequency data but only care about storing the aggregated downsampled metrics. They are typically used in combination with either the table views or custom events, such that the metrics are calculated then the underlying data is automatically dropped.
When combining drop tables, table views and live queries, you have a very easy-to-set up, event-based and real-time solution from capturing events, creating metrics, dropping stale data and live selects for visualisation.
If you have something even more bespoke in mind, you can even create your own event triggers based on when a record is created, updated or deleted. You can include any valid SurrealQL inside the event.
For example, we can create a simple real-time anomaly detection and notification solution using just SurrealQL events and functions in 5 steps.
-- Trigger the event on when a record is created DEFINE EVENT sensor_anomaly_notification ON sensor_readings WHEN $event = 'CREATE' THEN { -- Get the desired time range you want to track -- here we're grabing just the past hour LET $temp_past_hour = ( SELECT VALUE id[3] FROM sensor_readings:[ time::now() - 1h, ]..=[ time::now() ]); -- Calculate both the upper and lower threshold for an outlier -- using the standard Q1 - 1.5 * IQR formula for the low outliers LET $low_outliers = ( RETURN math::percentile($temp_past_hour, 25) - 1.5 * math::interquartile($temp_past_hour) ); -- Q3 + 1.5 * IQR formula for the high outliers LET $high_outliers = ( RETURN math::percentile($temp_past_hour, 75) + 1.5 * math::interquartile($temp_past_hour) ); -- If a low outlier is found send a http post request -- with the outlier details IF $after.id[3] < $low_outliers { http::post('https://dummyjson.com/comments/1', { id: rand::ulid(), outlier: $after.id, message: 'Outlier Detected: low temperature' }); }; -- If a high outlier is found send a http post request -- with the outlier details IF $after.id[3] > $high_outliers { http::post('https://dummyjson.com/comments/1', { id: rand::ulid(), outlier: $after.id, message: 'Outlier Detected: high temperature' }); }; };
There are many specialised time series databases out there, so where does SurrealDB fit in?
The advantages SurrealDB has over specialised time series databases are:
That you can combine our time series functionality with the rest of our multi-model database features. For example, doing full-text search and vector search on your log data.
No need to learn another query language just for time series. SurrealDB has a unified query language for all its features.
Connect and enrich your metrics easily, instead of having them being siloed in a separate system. You can have all your data in one place with zero ETL for your various use cases. Whether you’re doing transactional, analytical, ML and AI applications, SurrealDB covers a lot of the use cases a modern application needs.
SurrealDB will soon also become a native bi-temporal database, with the introduction of new data structure for our new SurrealKV storage engine.
The advantages specialised time series databases have over SurrealDB currently are:
Whether you pick SurrealDB for your time series use cases depends mostly on whether you are looking to lower your total system complexity or if you are looking for another specialised solution.