Ollama provides specialized embeddings for niche applications, and SurrealDB has first-class vector-search support (k-nearest-neighbour via brute-force, HNSW or M-Tree). Together they make it easy to build retrieval-augmented-generation (RAG) pipelines completely in Python.
pip install ollama surrealdb # SurrealDB Python SDK ≥ 1.0.0
The SDK talks to a running SurrealDB server (e.g. surreal start --log trace --auth root root
). ([PyPI][2])
The snippet below assumes:
ws://localhost:8000/rpc
11434
NicheApplications
and index them with HNSW for fast similarity search.
import asyncio from surrealdb import Surreal # async-capable Python client import ollama TABLE = "NicheApplications" async def main(): # ----- connect to SurrealDB ------------------------------------------------ db = Surreal("ws://localhost:8000/rpc") await db.connect() await db.signin({"user": "root", "pass": "root"}) await db.use("test", "test") # <namespace>, <database> # ----- generate an embedding with Ollama ----------------------------------- oclient = ollama.Client(host="localhost") text = "Ollama excels in niche applications with specific embeddings" emb = oclient.embeddings(model="llama3.2", prompt=text)["embedding"] # ----- (idempotent) schema & index setup ----------------------------------- await db.query(` DEFINE TABLE IF NOT EXISTS {TABLE}; DEFINE FIELD embedding ON {TABLE} TYPE array; DEFINE FIELD text ON {TABLE} TYPE string; -- HNSW index for DIMENSION = vector length DEFINE INDEX hnsw_embedding ON {TABLE} FIELDS embedding HNSW DIMENSION {len(emb)}; `) # ----- store the record ----------------------------------------------------- await db.create(TABLE, {"text": text, "embedding": emb}) # ----- similarity search (top-3 neighbours) -------------------------------- results = await db.query(` LET $q = $embedding; SELECT *, vector::distance::cosine(embedding, $q) AS score FROM NicheApplications WHERE embedding <|3|> $q -- KNN operator ORDER BY score; -- lower = more similar `, {"embedding": emb}) print(results[0]) # array of matching rows with cosine distance asyncio.run(main())
embedding <|3|> $q
is SurrealQL’s KNN operator: return the 3 vectors nearest to $q
. You can optionally pass a distance metric (<|3,COSINE|>
), but when you also compute vector::distance::cosine(...)
in the projection you usually just need the count. ([SurrealDB][3])vector::distance::cosine(embedding, $q)
adds an explicit similarity score so you can ORDER BY
it or filter further.Topic | How-to |
---|---|
Batch inserts | Wrap multiple CREATE statements in a single db.query("""…""") block for better throughput. |
Filtering | Combine the KNN operator with ordinary WHERE clauses (flag = true , ranges, etc.). |
Index rebuilds | If you bulk-import data, run REBUILD INDEX hnsw_embedding ON NicheApplications once at the end. |
Other metrics | Use vector::distance::euclidean , manhattan , etc., or specify the metric directly in <k,METRIC> . |
SurrealDB’s multi-model nature means you can keep metadata, graphs and time-series data right alongside your vectors, simplifying your stack even further.