When it comes to search, you can always use brute force.
In SurrealDB, you can use the brute force approach to search through your vector embeddings and data.
Brute force search compares a query vector against all vectors in the dataset to find the closest match. As this is a brute-force approach, you do not create an index for this approach.
The brute force approach for finding the nearest neighbour is generally preferred in the following use cases:
Small datasets / limited query vectors: For applications with small datasets, the overhead of building and maintaining an index might outweigh its benefits. In such cases, the brute force approach is optimal.
Guaranteed accuracy: Since the brute force method compares the query vector against every vector in the dataset, it guarantees finding the exact nearest vectors based on the chosen distance metric (like Euclidean, Manhattan, etc.).
Benchmarking models: The brute force approach can be used as a reference to help benchmark the performance of other approximate alternatives like HNSW.
While brute force can give you exact results, it's computationally expensive for large datasets.
In most cases, you do not need a 100% exact match, and you can give it up for faster, high-dimensional searches to find the approximate nearest neighbour to a query vector.
This is where Vector indexes come in.
In SurrealDB, you can perform a vector search using a Hierarchical Navigable Small World (HNSW) index, a state-of-the-art algorithm for approximate nearest neighbour search in high-dimensional spaces. It is a proximity graph-based index that offers a balance between search efficiency and accuracy.
Vector search cheat sheet
HNSW: efficient approximation for high dimensions or large datasets
Brute force: when you donβt define an index, or you want exact nearest neighbours, or when you provide a distance function to the query
HNSW index
| Parameter | Default | Options | Description |
|---|---|---|---|
| DIMENSION | Size of the vector | ||
| DIST | EUCLIDEAN | EUCLIDEAN, COSINE, MANHATTAN | Distance function |
| TYPE | F64 | F64, F32, I64, I32, I16 | Vector type |
| EFC | 150 | EF construction | |
| M | 12 | Max connections per element | |
| M0 | 24 | Max connections in the lowest layer | |
| LM | 0.40242960438184466f | Multiplier for level generation. This value is automatically calculated with a value considered as optimal. |
Examples:
For more details, see the DEFINE INDEX statement documentation.
Querying
| Functions | |
|---|---|
vector::distance::knn() | reuses the value computed during the query |
vector::distance::chebyshev(point, $vec) | |
vector::distance::euclidean(point, $vec) | |
vector::distance::hamming(point, $vec) | |
vector::distance::manhattan(point, $vec) | |
vector::distance::minkowski(point, $vec, 3) | third param is π |
vector::similarity::cosine(point, $vec) | |
vector::similarity::jaccard(point, $vec) | |
vector::similarity::pearson(point, $vec) |
WHERE statement
| Query | HNSW index |
|---|---|
<\|2\|> | uses distance function defined in index |
<\|2, EUCLIDEAN\|> | brute force method |
<\|2, COSINE\|> | brute force method |
<\|2, MANHATTAN\|> | brute force method |
<\|2, MINKOWSKI, 3\|> | brute force method (third param is π) |
<\|2, CHEBYSHEV\|> | brute force method |
<\|2, HAMMING\|> | brute force method |
<\|2, 10\|> | second param is effort* |
*effort: Tells HNSW how far to go in trying to find the exact response. HNSW is approximate, and may miss some vectors.
Notes
Verify index utilization in queries using the
EXPLAIN FULLclause. E.g:SELECT id FROM pts WHERE point <|10|> [2,3,4,5] EXPLAIN FULL;π values: (more about π in Minkowski distance)
20 = 1 β manhattan/diamond β
21 = 2 β euclidean/circle β
22 = 4 β squircle β’
2β = β β square β‘