CocoIndex is an incremental indexing framework for AI agents and LLM applications. You declare what should exist in a target store — documents, embeddings, knowledge graphs — and CocoIndex keeps it in sync, reprocessing only the delta on each run.
The SurrealDB connector writes to normal tables, relation (graph edge) tables, and vector indexes. CocoIndex tracks declared records across runs: it upserts changes, skips unchanged records, and removes records that are no longer declared. Related tables in the same database reconcile inside a single atomic transaction.
How it works
Declare sources — Walk local files, S3, Google Drive, and other connectors; transform with chunking, embeddings, or LLM extraction.
Declare targets — Mount SurrealDB
TableTargetandRelationTargetstates with optionalTableSchema(SCHEMAFULL) or schemaless tables.Reconcile — On each run, CocoIndex compares the declared state with the previous run and applies upserts and deletions. Schema and vector indexes can be managed automatically when
managed_byis"system".Query — Use SurrealQL for full-text search, graph traversals, and vector similarity on the resulting data. All data can be manually queried at its namespace and database in the same way as with any other SurrealDB instance.
Key capabilities
Incremental sync — Memoised pipeline steps and target-state reconciliation avoid reprocessing unchanged inputs.
Graph-native writes — Relation tables with polymorphic
from/toendpoints map cleanly to SurrealDBRELATEedges.Schema lifecycle — Optional
TableSchemawithColumnDef; CocoIndex can define fields and drop undeclared columns on re-run.Vector indexes — Declare HNSW indexes on embedding fields; metric and dimension changes trigger index recreation. Pipelines can embed locally with
SentenceTransformerEmbedder(Rust uses FastEmbed under the hood) so you can exercise vector targets without requiring an API key.Python and Rust — Pipelines are typically authored in Python; the Rust SDK exposes the same target-state model for native binaries and examples.
Local embeddings
CocoIndex pipelines that write vectors to SurrealDB often use SentenceTransformerEmbedder — models run on your machine and download once, similar to the zero-key graph demos below. The Rust SDK loads them via FastEmbed; Python uses the sentence-transformers library with the same Hugging Face model names. The full conversation_to_knowledge example uses this for entity resolution.
To sanity-check local embeddings with SurrealDB directly — or to pick a model before you wire up a CocoIndex pipeline — see the FastEmbed integration guide. It covers ONNX models, vector dimensions, and worked examples in Python and Rust without an API key.
Podcast → knowledge graph
CocoIndex's flagship SurrealDB example is conversation_to_knowledge, in which podcast episodes become a graph of sessions, statements, people, technologies, organisations, and mention edges.
| Input | What happens |
|---|---|
input/*.txt (YouTube URLs) | yt-dlp downloads audio → AssemblyAI transcribes with speaker labels → LLM extracts claims and entities → entity resolution → SurrealDB graph |
input/*.json (pre-transcribed) | Skips download/transcription; still uses LLM extraction in the full example |
The zero-key demos below use the same interview with musician and YouTuber Rick Beato and Alice in Chains guitarist Jerry Cantrell (YouTube link) with a pre-transcribed input/sample.json. Curated statements stand in for LLM extraction so you can see reconciliation without API keys. CocoIndex does not fetch transcripts in these demos.
When connecting with Surrealist or surreal sql, use the same namespace and database your program configures — for example cocoindex / beato_cantrell, unless you set that explicitly.
Getting started
Start SurrealDB:
Install CocoIndex with the SurrealDB extra:
Copy input/sample.json into an input directory next to main.py. The script mounts session, statement, person, tech, and org tables plus session_statement, person_session, person_statement, and polymorphic statement_mentions relations — the same shape as the full podcast example.
COCOINDEX_DB stores CocoIndex's local change-tracking state between runs. INCLUDE_SABBATH=1 declares the Tony Iommi / Black Sabbath influence branch; setting it to 0 on the second run removes those nodes and edges.
For LLM extraction, entity resolution, and live YouTube ingestion, follow CocoIndex's podcast-to-knowledge-graph tutorial and the conversation_to_knowledge source. The SurrealDB connector reference covers connection setup, TableSchema.from_class, vector indexes, and relation tables in full. For local embedding models, see FastEmbed.
Inspect the graph
After either demo completes, open Surrealist with namespace cocoindex and database beato_cantrell. The schema designer shows the session-centric graph CocoIndex declared:

Example queries:
After the second run (INCLUDE_SABBATH=0 in Python, or the built-in second pass in Rust), the Tony Iommi / Black Sabbath statement and its mention edges are gone — reconciliation removed anything no longer declared.
Appendix: sample.json
Create an input directory next to your demo program and save the following as input/sample.json. Utterance excerpts were polished from YouTube auto-captions for the Rick Beato × Jerry Cantrell interview.