• Start

Vector search

Vector indexes

Choose brute force, HNSW, or DISKANN vector indexes, tune DIMENSION and distance metrics, and use the query cheat sheet with vector::distance::knn().

When it comes to search, you can always use brute force.

In SurrealDB, you can use the brute force approach to search through your vector embeddings and data.

Brute force search compares a query vector against all vectors in the dataset to find the closest match. As this is a brute-force approach, you do not create an index for this approach.

The brute force approach for finding the nearest neighbour is generally preferred in the following use cases:

  • Small datasets / limited query vectors: For applications with small datasets, the overhead of building and maintaining an index might outweigh its benefits. In such cases, the brute force approach is optimal.

  • Guaranteed accuracy: Since the brute force method compares the query vector against every vector in the dataset, it guarantees finding the exact nearest vectors based on the chosen distance metric (like Euclidean, Manhattan, etc.).

  • Benchmarking models: The brute force approach can be used as a reference to help benchmark the performance of other approximate alternatives like HNSW or DISKANN.

While brute force can give you exact results, it's computationally expensive for large datasets.

In most cases, you do not need a 100% exact match, and you can give it up for faster, high-dimensional searches to find the approximate nearest neighbour to a query vector.

This is where vector indexes come in.

SurrealDB offers two approximate graph indexes for k-nearest-neighbour search:

IndexBest forStorage
HNSW (DEFINE INDEX … HNSW)Low-latency ANN when the graph fits comfortably in memory with headroom for the bounded vector cacheIn-memory hot graph + persistence
DISKANN (DEFINE INDEX … DISKANN) (SurrealDB 3.1+)Very large corpora where RAM cannot hold the full graph — optimises for disk-resident graphs and quantisation-friendly typesOn-disk graph with caching (not on WASM targets)

Both are proximity graph-style indexes. Queries use the same <|K, …|> KNN operator shapes; the optimiser picks the index when distances and types line up.

  • HNSW — efficient in-memory approximation for high dimensions or large in-RAM datasets.

  • DISKANN — disk-oriented approximation for embeddings that exceed practical memory for a pure HNSW graph.

  • Brute force — when you do not define an index, when you want exact nearest neighbours, or when you pass an explicit distance function to the query that does not route to your index.

ParameterDefaultOptionsDescription
DIMENSIONSize of the vector
DISTEUCLIDEANEUCLIDEAN, COSINE, MANHATTANDistance function
TYPEF64F64, F32, I64, I32, I16Vector type
EFC150EF construction
M12Max connections per element
M024Max connections in the lowest layer
LM0.40242960438184466fMultiplier for level generation. This value is automatically calculated with a value considered as optimal.

Examples:

-- User statement:
DEFINE INDEX hnsw_idx ON pts FIELDS point HNSW DIMENSION 4;
-- Defaults to:
DEFINE INDEX hnsw_idx ON pts FIELDS point HNSW DIMENSION 4 DIST EUCLIDEAN TYPE F64 EFC 150 M 12 M0 24 LM 0.40242960438184466f;
-- Users are strongly suggested not to set an LM value, as
-- it is computed based on other parameters. Only users
-- completely versed in the field should manually set it

For more details, see the DEFINE INDEX statement documentation.

Available since: v3.1.0
ParameterDefaultOptionsDescription
DIMENSIONVector dimension
DISTEUCLIDEANEUCLIDEAN, COSINE, INNER_PRODUCT, COSINE_NORMALIZEDDistance (narrower set than HNSW)
TYPEF32F32, F16, I8, U8Element encoding (COSINE_NORMALIZED requires F32 or F16)
DEGREE64> 0Target maximum graph degree
L_BUILD100> 0Construction search-list size
ALPHA1.2DiskANN pruning parameter
HASHED_VECTORoffOptional hash-stabilised vector keys
DEFINE INDEX diskann_idx ON pts FIELDS point DISKANN DIMENSION 4 DIST COSINE TYPE F32;

See DEFINE INDEX → DISKANN for platform notes (including no WASM support).

DEFINE INDEX hnsw_idx ON pts FIELDS point HNSW DIMENSION 4;

LET $vector = [2,3,4];
SELECT
id,
vector::distance::knn() as dist -- distance from $vector
-- knn reuses the value computed during
-- the query, in this case the euclidean
-- distance
FROM pts
WHERE point
<|2|> -- return 2, in this case using the distance function defined in the
-- index: euclidean
$vector;

With a DISKANN index defined on point, the same <|K, EF|> approximate form applies; the second number bounds the dynamic candidate list for search (see the KNN operator).

Functions
vector::distance::knn()reuses the value computed during the query
vector::distance::chebyshev(point, $vec)
vector::distance::euclidean(point, $vec)
vector::distance::hamming(point, $vec)
vector::distance::manhattan(point, $vec)
vector::distance::minkowski(point, $vec, 3)third param is 𝑝
vector::similarity::cosine(point, $vec)
vector::similarity::jaccard(point, $vec)
vector::similarity::pearson(point, $vec)

WHERE statement

QueryHNSW indexDISKANN index
<\|2\|>uses distance function defined in indexsame when the index distance matches
<\|2, EUCLIDEAN\|>brute force methodbrute force method
<\|2, COSINE\|>brute force methodbrute force method
<\|2, MANHATTAN\|>brute force methodbrute force method
<\|2, MINKOWSKI, 3\|>brute force method (third param is 𝑝)brute force method
<\|2, CHEBYSHEV\|>brute force methodbrute force method
<\|2, HAMMING\|>brute force methodbrute force method
<\|2, 10\|>second param is effort*same approximate form — second value bounds the candidate list

\* effort — for HNSW and DISKANN, the second number in <|K, N|> tells the engine how far to search along the graph. Both algorithms are approximate and may miss some vectors.

  • Verify index utilisation in queries using the EXPLAIN FULL clause. E.g: SELECT id FROM pts WHERE point <|10|> [2,3,4,5] EXPLAIN FULL;

  • 𝑝 values: (more about 𝑝 in Minkowski distance)

    • 20 = 1 → manhattan/diamond ◇

    • 21 = 2 → euclidean/circle ○

    • 22 = 4 → squircle ▢

    • 2 = ∞ → square □

Was this page helpful?