Ingestion

Uploading documents

How to ingest documents into the Spectron knowledge layer.

The knowledge layer holds authoritative material – manuals, policies, product data, and files your agents should treat as curated sources. Documents enter through an asynchronous upload pipeline: bytes land in object storage, then Spectron extracts, chunks, embeds, and indexes structured state in SurrealDB.

These MIME types are accepted on POST /api/v1/{context_id}/documents and processed under the default TextOnly or StandardMultimodal profiles:

FormatMIME type
Plain texttext/plain
Markdowntext/markdown
JSONapplication/json
HTMLtext/html
PDFapplication/pdf

The upload endpoint also accepts image, audio, and video MIME types when the Context uses StandardMultimodal or MultimodalFull. OCR, transcription, captioning, and modality-native embeddings run only when the selected profile enables them:

FormatMIME types (examples)
Imagesimage/png, image/jpeg, image/webp, image/gif
Audioaudio/wav, audio/mpeg, audio/ogg, audio/flac, audio/aac
Videovideo/mp4, video/webm, video/quicktime

Under TextOnly, multimodal uploads may be accepted but image/audio/video processing stages are skipped. See Multimodal content for profile details.

The upload endpoint is asynchronous. It returns 202 Accepted with a document id and initial status; processing continues on the worker tier.

POST /api/v1/{context_id}/documents
Content-Type: multipart/form-data

file=<binary>
title=Returns Policy
scope[org]=acme

Response:

{
"id": "doc:01hx9…",
"status": "queued",
"content_hash": "blake3:4f3c…",
"deduplicated": false
}
spectron documents upload ./returns-policy.pdf \
--url "$SPECTRON_URL" \
--api-key "$SPECTRON_API_KEY" \
--context-id "$SPECTRON_CONTEXT_ID"

Use the generated OpenAPI clients (Python, TypeScript) for application code – method names follow the spec.

Poll GET /api/v1/{context_id}/documents/{id} until status is ready or failed.

StatusDescription
queuedWaiting to enter the pipeline
extractingReading content from the uploaded bytes
chunkingSplitting content into overlapping segments
embeddingGenerating dense vectors for chunks
renderingBuilding document summaries and section metadata
transcribingTranscribing audio or video (multimodal profiles)
captioningGenerating captions for images (multimodal profiles)
keywordingRAKE keyword extraction
readyFully indexed and available for retrieval
failedPipeline error; inspect error on the document record

Every document is identified by a BLAKE3 hash of its raw bytes. Re-uploading identical content returns the existing document with deduplicated: true and skips reprocessing.

Set scope on upload so only principals with matching grants can retrieve the document’s chunks during recall.

  • GET /documents/{id} – status and metadata

  • GET /documents – list with filters

  • GET /documents/{id}/chunks – parsed segments

  • GET /documents/{id}/raw – original bytes

  • DELETE /documents/{id} – remove document, chunks, graph edges, and object-store bytes

See REST API for request shapes.

Was this page helpful?