SurrealDB is a multi-model database that ships two built-in vector-search algorithms:
Algorithm | Kind | When to use |
---|---|---|
HNSW | Approx. ANN, in-memory or on-disk | Low-latency semantic search / RAG |
M-Tree | Exact metric tree, on-disk | Smaller datasets where recall = 100 % matters |
You expose either one with a single DEFINE INDEX … HNSW|MTREE
statement in SurrealQL
pip install surrealdb langchain-community langchain-openai # swap in any embedding provider you like
surrealdb
→ Python SDKlangchain-community
→ houses SurrealDBStore
langchain-openai
(or HF, Cohere, etc.) → embeddings
from langchain_community.vectorstores.surrealdb import SurrealDBStore from langchain_openai import OpenAIEmbeddings # use any Embeddings class emb = OpenAIEmbeddings() # or HF, Cohere, etc. store = SurrealDBStore.from_texts( texts, # list[str] embedding=emb, dburl="ws://localhost:8000/rpc", # SurrealDB RPC endpoint ns="langchain", db="docstore", collection="texts", db_user="root", db_pass="root", )
Under the hood the helper will:
texts
(if it doesn’t exist).text
& embedding
.If you’ve already ingested vectors (e.g. via another app) just instantiate the store directly:
from langchain_community.vectorstores.surrealdb import SurrealDBStore store = SurrealDBStore( embedding_function=emb, dburl="ws://localhost:8000/rpc", ns="langchain", db="docstore", collection="texts", db_user="root", db_pass="root", )
surreal start --mem --user root --pass root
Connect with dburl="ws://localhost:8000/rpc"
and everything lives in RAM only – perfect for unit tests.
surreal start file://./surreal.db --user root --pass root
Vectors (including the HNSW graph) are persisted between runs.
Spin SurrealDB in Docker, K8s, Nomad, Fly.io, Railway – connection string stays the same (ws://host:8000/rpc
or http://…/rpc
).
query = "How do I enable vector search in SurrealDB?" docs = store.similarity_search(query, k=3) # cosine by default
If your table has an HNSW index, LangChain will issue a query that looks like:
SELECT *, vector::distance::knn() AS score FROM texts WHERE embedding <|3,64|> $q_vec ORDER BY score;
<|K,EF|>
→ SurrealDB’s built-in K-NN operator (K=3, efSearch=64).vector::distance::knn()
→ pulls the pre-computed distance for free. ([SurrealDB][2])Omit DEFINE INDEX
and SurrealDBStore will fall back to vector::distance::cosine()
for full-accuracy ranking.
SurrealDB doesn’t yet bundle a sparse index, but you can:
content
field with SEARCH
in SurrealQL.vector_filter
to LangChain’s retriever to run a second vector pass for re-ranking.If you prefer M-Tree for exact search at the index level:
DEFINE INDEX mt_texts ON texts FIELDS embedding MTREE DIMENSION 768 DIST COSINE;
LangChain code stays unchanged – only your DEFINE INDEX
differs.
SurrealDBStore
That’s it – you now have a fully-featured LangChain vector store powered by SurrealDB’s built-in HNSW / M-Tree indexes, no boilerplate required. 🚀