The knowledge layer holds authoritative material – manuals, policies, product data, and files your agents should treat as curated sources. Documents enter through an asynchronous upload pipeline: bytes land in object storage, then Spectron extracts, chunks, embeds, and indexes structured state in SurrealDB.
Supported formats
Text-first (default ingestion profile)
These MIME types are accepted on POST /api/v1/{context_id}/documents and processed under the default TextOnly or StandardMultimodal profiles:
| Format | MIME type |
|---|---|
| Plain text | text/plain |
| Markdown | text/markdown |
| JSON | application/json |
| HTML | text/html |
application/pdf |
Multimodal (requires a richer ingestion profile)
The upload endpoint also accepts image, audio, and video MIME types when the Context uses StandardMultimodal or MultimodalFull. OCR, transcription, captioning, and modality-native embeddings run only when the selected profile enables them:
| Format | MIME types (examples) |
|---|---|
| Images | image/png, image/jpeg, image/webp, image/gif |
| Audio | audio/wav, audio/mpeg, audio/ogg, audio/flac, audio/aac |
| Video | video/mp4, video/webm, video/quicktime |
Under TextOnly, multimodal uploads may be accepted but image/audio/video processing stages are skipped. See Multimodal content for profile details.
Uploading a document
The upload endpoint is asynchronous. It returns 202 Accepted with a document id and initial status; processing continues on the worker tier.
REST
The metadata part is JSON. scopes is a DNF selector (OR of conjunctive slash-path clauses). labels are descriptive key=value tags (same validation as fact ingest — keys must not start with _; count caps return 409). Optional observedAt (RFC 3339) sets the known time of facts derived from this document — essential for page-by-page or episode-by-episode ingest where later plot points must stay hidden until the reader reaches them (spoiler-safe narrative memory). Omit scopes to tag the document with the caller's full memory:write region.
Note
Response:
CLI
Use the generated OpenAPI clients (Python, TypeScript) for application code – method names follow the spec.
Polling for status
Poll GET /api/v1/{context_id}/documents/{id} until status is ready or failed.
Pipeline stages
| Status | Description |
|---|---|
queued | Waiting to enter the pipeline |
extracting | Reading content from the uploaded bytes |
chunking | Splitting content into overlapping segments |
embedding | Generating dense vectors for chunks |
rendering | Building document summaries and section metadata |
transcribing | Transcribing audio or video (multimodal profiles) |
captioning | Generating captions for images (multimodal profiles) |
keywording | RAKE keyword extraction |
ready | Fully indexed and available for retrieval |
failed | Pipeline error; inspect error on the document record |
Content addressing and deduplication
Every document is identified by a BLAKE3 hash of its raw bytes. Re-uploading identical content returns the existing document with deduplicated: true and skips reprocessing.
When a second uploader in a different scope hits the same hash, Spectron unions their scope clause onto the existing document (and related index records) instead of trapping them with a deduplicated id they cannot read. Each union emits a document.scope_widen audit event; operators can watch the documents.scope_clause_count histogram (values above ~32 clauses on one document warrant investigation).
Outbound links
During ingest, Spectron extracts outbound hyperlinks from the raw bytes and stores them as typed knowledge_links_to edges:
| Source | Link kind |
|---|---|
Markdown []() syntax | markdown_link |
HTML <a href="…"> attributes | html_link |
| PDF page link annotations | pdf_annotation |
These edges feed hybrid_graph reranking (document-link density) and citation-style navigation between corpus documents.
Scope and labels
Documents and their chunks inherit the caller's resolved write scope from the API key when you omit explicit scope on upload — required for scoped keys to recall their own uploads.
You can narrow tagging with scopes on POST /documents, spectron documents upload --scope …, or MCP upload — the path must lie within the caller's memory:write region (out-of-region scope returns 403). A document's scope is fixed at upload — reprocess rejects a non-empty scopes field with 400.
Optional labels (key=value strings) are stamped on the document, chunks, and sections. They follow the same validation rules as fact ingest and are not copied onto reconciled graph rows.
Note
Document management
GET /documents/{id}– status and metadataGET /documents– list with filtersGET /documents/{id}/chunks– parsed segmentsGET /documents/{id}/raw– original bytesDELETE /documents/{id}– remove document, chunks, graph edges, and object-store bytes
See REST API for request shapes.