The knowledge layer holds authoritative material – manuals, policies, product data, and files your agents should treat as curated sources. Documents enter through an asynchronous upload pipeline: bytes land in object storage, then Spectron extracts, chunks, embeds, and indexes structured state in SurrealDB.
Supported formats
Text-first (default ingestion profile)
These MIME types are accepted on POST /api/v1/{context_id}/documents and processed under the default TextOnly or StandardMultimodal profiles:
| Format | MIME type |
|---|---|
| Plain text | text/plain |
| Markdown | text/markdown |
| JSON | application/json |
| HTML | text/html |
application/pdf |
Multimodal (requires a richer ingestion profile)
The upload endpoint also accepts image, audio, and video MIME types when the Context uses StandardMultimodal or MultimodalFull. OCR, transcription, captioning, and modality-native embeddings run only when the selected profile enables them:
| Format | MIME types (examples) |
|---|---|
| Images | image/png, image/jpeg, image/webp, image/gif |
| Audio | audio/wav, audio/mpeg, audio/ogg, audio/flac, audio/aac |
| Video | video/mp4, video/webm, video/quicktime |
Under TextOnly, multimodal uploads may be accepted but image/audio/video processing stages are skipped. See Multimodal content for profile details.
Uploading a document
The upload endpoint is asynchronous. It returns 202 Accepted with a document id and initial status; processing continues on the worker tier.
REST
Response:
CLI
Use the generated OpenAPI clients (Python, TypeScript) for application code – method names follow the spec.
Polling for status
Poll GET /api/v1/{context_id}/documents/{id} until status is ready or failed.
Pipeline stages
| Status | Description |
|---|---|
queued | Waiting to enter the pipeline |
extracting | Reading content from the uploaded bytes |
chunking | Splitting content into overlapping segments |
embedding | Generating dense vectors for chunks |
rendering | Building document summaries and section metadata |
transcribing | Transcribing audio or video (multimodal profiles) |
captioning | Generating captions for images (multimodal profiles) |
keywording | RAKE keyword extraction |
ready | Fully indexed and available for retrieval |
failed | Pipeline error; inspect error on the document record |
Content addressing and deduplication
Every document is identified by a BLAKE3 hash of its raw bytes. Re-uploading identical content returns the existing document with deduplicated: true and skips reprocessing.
Scope assignment
Set scope on upload so only principals with matching grants can retrieve the document’s chunks during recall.
Document management
GET /documents/{id}– status and metadataGET /documents– list with filtersGET /documents/{id}/chunks– parsed segmentsGET /documents/{id}/raw– original bytesDELETE /documents/{id}– remove document, chunks, graph edges, and object-store bytes
See REST API for request shapes.