
ENGINEERING DEEP DIVE
What is SurrealDB?
A deep dive into the architecture, storage engine, query language, and key use cases that make SurrealDB a multi-model database built for modern applications and AI workloads.
THE PROBLEM
Building an enterprise AI agent today means stitching together five or six independent databases: a document store, a graph database, a vector index, a relational engine, a memory layer, and a message broker.
Each has its own consistency model, its own query language, and its own failure modes. When agents fail, it is rarely because the model is weak. It is because the data layer underneath cannot deliver consistent, complete context in a single operation.
THE SOLUTION
SurrealDB is the only data layer an enterprise agent needs. Your model, your data, one database.
It provides documents, graphs, vectors, time-series, geospatial, and relational structures as native primitives within a single engine, coordinated by a single query language (SurrealQL), and governed by a single ACID transaction boundary. Combined with Spectron for persistent agent memory and SurrealDS for distributed storage backed by object storage, it forms one vertical stack from object storage to agent memory.
THE VERTICAL STACK
From object storage to agent memory
Spectron gives agents persistent memory. SurrealDB unifies every data model in one ACID transaction. The storage engine separates compute from storage on commodity object storage. No glue code. No middleware.
SURREALQL
SurrealQL: one query, every model
The best way to understand SurrealQL is to see the payoff first. Consider a retrieval query for an AI agent that needs to find relevant knowledge base articles for a customer:
That single statement applies tenant isolation, temporal filtering, graph traversal through the customer's product relationships, and hybrid vector + full-text ranking. In a multi-system architecture, this would require four or five round-trips across independent databases with no transactional consistency between them. In SurrealQL, it is one query, one transaction, one consistent snapshot.
SELECT id, title,
vector::distance::knn() AS vec_dist,
search::score(1) AS ft_score,
(1 - vec_dist) * 0.6 + ft_score * 0.4
AS blend_score
FROM knowledge_base
WHERE tenant = $tenant
AND updated_at > time::now() - 30d
AND id IN $customer->owns->product
->has_issue->knowledge_base
AND content_embedding <|50,20|>
$query_embedding
AND content @1@ $query_text
ORDER BY blend_score DESC
LIMIT 10;
SCOPE-FIRST RETRIEVAL
One query, one transaction
This works because SurrealQL treats every data model - documents, graphs, vectors, full-text, time-series, geospatial, relational - as composable operators within the same syntax. Here is how each one works individually.
DATA MODELS
Composable by design
Graph relationships
Graph edges are native to the data model. Relationships are created with RELATE and traversed with arrow syntax. What makes SurrealDB's graph model distinct is that every edge is a full document - it can carry its own fields, metadata, embeddings, timestamps, and permissions.
Vector search
Vector similarity search is built into the query engine. You define an index on a field and query it with distance functions. The structured filter (category = 'tools') narrows the candidate set before the vector search runs - scope first, rank second.
RELATE customer:alice->purchased->product:widget_pro
SET quantity = 2,
date = time::now(),
source = 'web',
sentiment_embedding = $embedding;
SELECT ->purchased[WHERE date > time::now() - 30d]->product.name
FROM customer:alice;
DEFINE INDEX product_embedding ON product
FIELDS embedding HNSW DIMENSION 1536 DIST COSINE;
SELECT id, name,
vector::distance::knn() AS distance
FROM product
WHERE category = 'tools'
AND embedding <|10,40|> $query_embedding
ORDER BY distance;
Documents and record links
SurrealDB stores data as schemaless or schemafull documents. Record links replace foreign key joins with direct references that the engine resolves at query time, eliminating N+1 query patterns.
CREATE order SET
customer = customer:alice,
items = [product:widget_pro, product:gadget_x],
total = 79.98;
-- Traverse record links inline, no JOIN needed
SELECT customer.name, items.name, total
FROM order;
TEMPORAL AND SEARCH
Time-series, full-text, and real-time
Time-series and temporal queries
Native duration arithmetic, temporal aggregation functions, and point-in-time VERSION reads for time-travel queries without blocking writers.
Full-text search
BM25-scored full-text indexes with configurable analysers and the @@ match operator. Composes with every other model in the same query.
-- Aggregate sensor readings by hourly windows
SELECT
time::floor(recorded_at, 1h) AS hour,
math::mean(value) AS avg_temp,
math::max(value) AS peak
FROM reading
WHERE sensor = sensor:temp_01
AND recorded_at > time::now() - 7d
GROUP BY hour
ORDER BY hour DESC;
-- Time-travel: query the exact state 5 days ago
SELECT * FROM reading
VERSION d'2026-03-20T00:00:00Z';
DEFINE ANALYZER english TOKENIZERS blank, class
FILTERS lowercase, snowball(english);
DEFINE INDEX ft_content ON article
FIELDS content
FULLTEXT ANALYZER english BM25;
SELECT id, title,
search::score(1) AS relevance
FROM article
WHERE content @1@ 'distributed consensus'
ORDER BY relevance DESC;
Live queries, events, and more
Geospatial queries work on native GeoJSON types with built-in distance, bearing, and containment functions. LIVE SELECT provides real-time subscriptions over WebSockets. DEFINE EVENT triggers server-side logic on data changes, and changefeeds provide ordered, durable mutation streams per table.
Every one of these models composes with every other. A single SurrealQL statement can combine a graph traversal with a vector search, scope it by a full-text match, filter by a temporal range, restrict by geospatial proximity, aggregate relationally, and stream the results to a live subscriber.
ACID TRANSACTIONS
ACID transactions across every model
Every SurrealQL statement executes within an ACID transaction, regardless of which data models are involved. If a single query updates a document, creates a graph edge, writes a vector embedding, and persists a Spectron memory fact, either all of those operations commit or none do.
Every agent follows the same cycle: read context, reason over it, write the result. In multi-system architectures, reads and writes span databases with independent consistency models, so by the time the agent writes back, the data it read may have already changed. SurrealDB executes the entire loop within a single ACID transaction. The context the agent reads is the same consistent snapshot it writes against.
ARCHITECTURE
One engine, one key-value substrate
At its lowest layer, SurrealDB stores all data - records, graph edges, index entries, metadata - as binary key-value pairs in a transactional KV store. The "models" are not separate engines. They are different query patterns and data structures layered on top of a single KV substrate.
A document is a KV entry with a * path separator. A graph edge pointer uses a ~ tag with an empty value - the key itself encodes the relationship. Index data entries use a + prefix.
Because every model shares the same sorted byte stream, there is no serialisation boundary between subsystems. When a query combines a graph traversal with a vector search, the query planner sees the entire operation and executes it against a single consistent snapshot. The KV store keeps keys in sorted binary order, so every "query by scope" becomes a tight prefix range scan. Graph traversals are not joins - they are prefix scans on contiguous slices of the sorted key space. When you write user:tobie->contributes->repo in SurrealQL, the engine jumps directly to the right byte range. A document lookup, a graph traversal, and an index scan all resolve to the same fundamental operation: scan a contiguous range of bytes.
QUERY ENGINE
Specialised indexes, streaming execution
While the storage is unified KV, the indexing layer is purpose-built for each model: HNSW graphs for vector similarity, BM25-scored inverted indexes for full-text, B-tree derivatives for structured lookups, and the directional key structure itself for graph traversals.
SurrealDB 3.0 rearchitected the query engine around streaming execution, processing results without materialising full intermediate result sets - critical for graph traversals where intermediate sets can explode in size.
-- HNSW graph for vector similarity
DEFINE INDEX product_vec ON product
FIELDS embedding
HNSW DIMENSION 1536 DIST COSINE;
-- BM25 inverted index for full-text
DEFINE INDEX article_ft ON article
FIELDS content
FULLTEXT ANALYZER english BM25;
-- B-tree for structured lookups
DEFINE INDEX user_email ON user
FIELDS email UNIQUE;
-- Streaming execution: graph traversal
-- feeds into vector ranking without
-- materialising the intermediate set
SELECT id, name,
vector::distance::knn() AS dist
FROM customer:acme->owns->product
WHERE embedding <|10|> $query_vec
ORDER BY dist
LIMIT 5;
STORAGE BACKENDS
Pluggable storage, unified interface
The KV substrate is pluggable. SurrealKV is a custom-built embedded engine using a Versioned Adaptive Radix Trie (VART) over an LSM-tree architecture - it provides O(m) lookup matched to SurrealDB's hierarchical key layout, with built-in MVCC for time-travel VERSION queries.
SurrealMX is in-memory with optional persistence via append-only logs and snapshots. SurrealDS is the distributed storage layer, described below. Every backend exposes the same transactional interface - switch engines without changing a single query.
# SurrealKV — embedded, VART + LSM-tree, MVCC
surreal start file:production.db
# SurrealMX — in-memory, optional persistence
surreal start memory
# SurrealDS — distributed, S3-backed
surreal start surrealds
# Same queries. Same transactions.
# Every backend.
SURREALDS
Compute-storage separation
For production-scale distributed deployments, SurrealDS separates compute from storage entirely. Transactional data is durably persisted in commodity object storage - Amazon S3, Google Cloud Storage, or Azure Blob Storage. Compute nodes are stateless and elastic.
INDEPENDENT SCALING
Compute and storage scale separately. Add read replicas without adding storage, or grow datasets without adding compute.
SCALE TO ZERO
Compute nodes shut down when idle. Data remains safe in object storage. Recovery time is proportional to log delta, not dataset size.
BUILT-IN DURABILITY
S3-class storage offers 99.999999999% durability. No separate backup infrastructure or snapshot management needed.
INSTANT BRANCHING
Create petabyte-scale database branches in seconds via logical metadata references - Git-like workflows for data.
SPECTRON
Persistent agent memory
Most memory solutions for AI agents are middleware layers that sit above a fragmented data stack. They abstract over the seams between your vector database, your document store, and your graph engine - but the seams are still there. Memory writes go to one system, application data to another, and there is no transactional guarantee that the two are consistent. When an agent retrieves a memory that references data which has since changed, it reasons over a stale view of the world.
Spectron eliminates this. It is a persistent, structured memory engine built on SurrealDB. When a conversation is ingested, Spectron autonomously extracts entities, builds knowledge graph connections, tracks temporal facts with bi-temporal validity, and indexes everything for hybrid retrieval. All of this commits atomically in the same ACID transaction as the application data it relates to.
Working memory
Active context window for the current conversation and in-flight reasoning.
Semantic memory
Entities, properties, and relationships stored as a knowledge graph with vector embeddings.
Episodic memory
Structured records of past interactions, graph-linked and searchable by vector similarity.
Procedural memory
Learned patterns and decision heuristics informed by traces of past actions and outcomes.
Preference memory
Accumulated user preferences, interaction feedback, and behavioural patterns.
Because Spectron runs on SurrealDB, it composes naturally with every other data model. A single SurrealQL statement can traverse a user's purchase history through graph edges, filter reviewed products by semantic similarity, and retrieve only currently valid preferences via temporal constraints - all in one query, one transaction.
Multiple agents can read and write to the same memory surface with full ACID guarantees - coordination happens through shared context rather than message passing. Spectron inherits the full security model (row-level permissions, namespace isolation) and the full storage stack (SurrealDS, scale-to-zero, branching). Between conversations, it continues working in the background: discovering connections, consolidating knowledge, resolving ambiguities, and inferring implicit relationships.
LET $user = user:jaime;
LET $query_vec = fn::embed(
"What products does this user like?"
);
SELECT
->purchased->product
AS purchase_history,
->reviewed->product[
WHERE vector::similarity::cosine(
embedding, $query_vec
) > 0.8
] AS relevant_products,
->preferences[
WHERE valid_at <= time::now()
] AS current_preferences
FROM ONLY $user;
MORE CAPABILITIES
Beyond the core models
Geospatial queries
Native GeoJSON support with built-in distance, bearing, area, and containment functions. No PostGIS extension. Geospatial composes with everything else - find stores within 5km and traverse their inventory graph in one statement.
DEFINE API: custom endpoints in the database
DEFINE API creates custom HTTP endpoints directly inside SurrealDB - no external framework, no routing layer. The endpoint inherits ACID transactions, row-level permissions, and multi-model query capabilities. For agent-facing APIs and internal tools, this eliminates the entire API routing layer.
SELECT name,
geo::distance(location, $user_location)
AS dist,
->stocks->product[
WHERE category = 'electronics'
] AS inventory
FROM store
WHERE geo::distance(
location, $user_location
) < 5000
ORDER BY dist;
DEFINE API "/agent/context"
FOR post
PERMISSIONS
WHERE $auth.role = "agent"
THEN {
LET $results = SELECT *
FROM knowledge
WHERE vector::similarity::cosine(
embedding,
$request.body.embedding
) > 0.8;
RETURN {
status: 200,
body: $results,
}
};
Surrealism: the extension system
A WebAssembly-based extension system. Write an extension in Rust, compile it to a .surli module, load it into a running database. Your functions become callable from SurrealQL, sandboxed in WASM, participating in ACID transactions. See surrealdb.com/surrealism.
DEFINE MODULE mod::sentiment
AS f"plugins:/sentiment.surli";
UPDATE article SET
sentiment = mod::sentiment::analyze(
content
),
keywords = mod::sentiment::extract(
content
)
WHERE created_at > time::now() - 1h;
Single binary, runs everywhere
SurrealDB compiles to a single binary. It runs in the browser via WebAssembly, embedded in edge devices, as a serverless function, as a single-node server, or as a distributed SurrealDS cluster. The query engine, data model, and application code are identical across all environments - a prototype built embedded in a browser can move to a distributed cluster in production without rewriting a single query.
How SurrealDB compares
Feature
Postgres
Neo4j
Pinecone / Weaviate
SurrealDB
Data models
Data models
Postgres
Relational + bolted-on
Neo4j
Graph only
Pinecone / Weaviate
Vectors only
SurrealDB
Docs, graphs, vectors, time-series, geospatial, relational
Graph support
Graph support
Postgres
Recursive CTEs
Neo4j
Native (Cypher)
Pinecone / Weaviate
None
SurrealDB
Native (arrow syntax, edges as documents)
Vector search
Vector search
Postgres
pgvector extension
Neo4j
Separate system
Pinecone / Weaviate
Native ANN
SurrealDB
Native, composable with filters + graphs
ACID scope
ACID scope
Postgres
Single model
Neo4j
Graph only
Pinecone / Weaviate
Eventually consistent
SurrealDB
Cross-model (docs + graphs + vectors + memory)
Agent memory
Agent memory
Postgres
External middleware
Neo4j
External middleware
Pinecone / Weaviate
External middleware
SurrealDB
Spectron (built on SurrealDB, ACID-consistent)
Storage
Storage
Postgres
Coupled compute-storage
Neo4j
Coupled, cache-dependent
Pinecone / Weaviate
Managed cloud
SurrealDB
Object storage-backed, compute-storage separation
Extensibility
Extensibility
Postgres
C extensions, PL/pgSQL
Neo4j
Java plugins
Pinecone / Weaviate
None
SurrealDB
Surrealism (sandboxed WASM)
Fair instinct, but architecturally wrong here. SurrealDB is not a wrapper over separate engines - it is one engine where every model shares a single KV substrate, a single query planner, and a single transaction coordinator. There are no seams between subsystems, no serialisation boundaries, no index coordination overhead. The cost of adding a graph traversal to a vector query is additive, not multiplicative.
A purpose-built, single-model database will beat SurrealDB at its own specialty in isolation. But in production, your agent doesn't need "just vector search." It needs vector search scoped by a graph traversal, filtered by tenant and time, ranked by a hybrid score, and committed atomically alongside a memory update. The specialised system is fast at step one - the other four steps require separate systems, network round-trips, and glue code that add more latency and failure modes than the per-model difference saves.
If your workload is purely relational, Postgres is excellent. The question gets interesting when you start adding pgvector for embeddings, recursive CTEs for graph traversal, Elasticsearch for full-text, Redis for real-time subscriptions, and a separate auth service - each with its own consistency model and failure modes. SurrealDB collapses that into one engine. The same query that would require four round-trips across independent systems is a single SurrealQL statement in a single ACID transaction.
There are structural differences too: record links replace foreign key joins, graph edges are full documents, row-level permissions and live queries are built in, and SurrealDS scales compute and storage independently on object storage - a cost model coupled architectures cannot match.
Neither. SurrealDB is built from scratch in Rust - the query engine, storage engines, transaction coordinator, permission system, and real-time subscription layer were all designed and implemented from the ground up. When you run a query combining graph traversal with vector similarity and structured filters, it executes natively inside SurrealDB's own query planner. There is no Postgres underneath, no Neo4j, no Pinecone.
A database engine needs deterministic memory management, predictable latency, and safe concurrency. Rust's ownership model delivers all three without a garbage collector - no GC pauses during query execution, no unpredictable latency spikes under load, no memory overhead from a managed runtime.
SurrealDB 3.0 is generally available and used in production by organisations including Nvidia, Samsung, Tencent, Verizon, Walmart, and ING across finance, healthcare, gaming, and defence. Surreal Cloud provides fully managed deployment with enterprise support, SLAs, and SOC 2 / ISO 27001 compliance.
SUMMARY
The full picture
SurrealDS provides distributed storage backed by S3-class object storage with quorum consensus, compute-storage separation, and scale-to-zero. SurrealDB provides the context layer: documents, graphs, vectors, time-series, geospatial, and relational structures as native primitives in one ACID transaction. Spectron provides persistent, structured agent memory that commits atomically alongside application data.
Your model.
Your data.
One database.
No middleware.
No bolt-ons.
No five-database stack.
