Uploading documents

The knowledge layer holds authoritative material – manuals, policies, product data, and files your agents should treat as curated sources. Documents enter through an asynchronous upload pipeline: bytes land in object storage, then Spectron extracts, chunks, embeds, and indexes structured state in SurrealDB.

Supported formats

Text-first (default ingestion profile)

These MIME types are accepted on POST /api/v1/{context_id}/documents and processed under the default TextOnly or StandardMultimodal profiles:

Format	MIME type
Plain text	`text/plain`
Markdown	`text/markdown`
JSON	`application/json`
HTML	`text/html`
PDF	`application/pdf`

Multimodal (requires a richer ingestion profile)

The upload endpoint also accepts image, audio, and video MIME types when the Context uses StandardMultimodal or MultimodalFull. OCR, transcription, captioning, and modality-native embeddings run only when the selected profile enables them:

Format	MIME types (examples)
Images	`image/png`, `image/jpeg`, `image/webp`, `image/gif`
Audio	`audio/wav`, `audio/mpeg`, `audio/ogg`, `audio/flac`, `audio/aac`
Video	`video/mp4`, `video/webm`, `video/quicktime`

Under TextOnly, multimodal uploads may be accepted but image/audio/video processing stages are skipped. See Multimodal content for profile details.

Uploading a document

The upload endpoint is asynchronous. It returns 202 Accepted with a document id and initial status; processing continues on the worker tier.

REST

POST /api/v1/{context_id}/documents
Content-Type: multipart/form-data

file=<binary>
title=Returns Policy
scope[org]=acme

Response:

{
  "id": "doc:01hx9…",
  "status": "queued",
  "content_hash": "blake3:4f3c…",
  "deduplicated": false
}

CLI

spectron documents upload ./returns-policy.pdf \
  --url "$SPECTRON_URL" \
  --api-key "$SPECTRON_API_KEY" \
  --context-id "$SPECTRON_CONTEXT_ID"

Use the generated OpenAPI clients (Python, TypeScript) for application code – method names follow the spec.

Polling for status

Poll GET /api/v1/{context_id}/documents/{id} until status is ready or failed.

Pipeline stages

Status	Description
`queued`	Waiting to enter the pipeline
`extracting`	Reading content from the uploaded bytes
`chunking`	Splitting content into overlapping segments
`embedding`	Generating dense vectors for chunks
`rendering`	Building document summaries and section metadata
`transcribing`	Transcribing audio or video (multimodal profiles)
`captioning`	Generating captions for images (multimodal profiles)
`keywording`	RAKE keyword extraction
`ready`	Fully indexed and available for retrieval
`failed`	Pipeline error; inspect `error` on the document record

Content addressing and deduplication

Every document is identified by a BLAKE3 hash of its raw bytes. Re-uploading identical content returns the existing document with deduplicated: true and skips reprocessing.

Scope assignment

Set scope on upload so only principals with matching grants can retrieve the document’s chunks during recall.

Document management

GET /documents/{id} – status and metadata
GET /documents – list with filters
GET /documents/{id}/chunks – parsed segments
GET /documents/{id}/raw – original bytes
DELETE /documents/{id} – remove document, chunks, graph edges, and object-store bytes

See REST API for request shapes.