Importing large quantities of documents or knowledge nodes into Spectron.
When populating a new Context or migrating from an existing system, you typically need to ingest many documents or knowledge nodes at once. Spectron's ingestion pipeline is designed for concurrent usage, and both the document upload endpoint and the node batch upsert endpoint support high-throughput ingestion patterns.
Uploading many documents
POST /api/v1/{context_id}/documents accepts one document per request. For bulk uploads, issue multiple requests concurrently and track their status independently.
The recommended concurrency ceiling for standard deployments is 10–20 concurrent uploads. Self-hosted deployments can be tuned according to your infrastructure capacity.
Bulk structured facts
For structured catalogue or policy data you already trust, use POST /api/v1/{context_id}/facts with infer: "triples" (or batch multiple utterances via /facts/batch). The reconciler persists entities, attributes, and relations in the unified graph with source.kind = "document" or operator-provided provenance.
POST /api/v1/{context_id}/facts Content-Type: application/json API-KEY: <key>
{ "text": "Product Widget A (sku_001) costs 29.99 and belongs to category widgets.", "infer": "triples", "scope": { "org": "acme" } }
# Build nodes and relations from your data source nodes=[ {"kind": "product","slug": record["sku"],"title": record["name"],"content": record} forrecordinproduct_catalogue ]
Document uploads are automatically deduplicated by content hash. If the same file is submitted multiple times during a bulk import – for example, because a script is re-run after a partial failure – each duplicate returns the existing document ID with deduplicated: true and no reprocessing occurs.
Node upserts are deduplicated by (kind, slug). Resubmitting a node with the same kind and slug updates its title and content in place.
These properties make bulk imports safe to re-run. A failed or interrupted import can be restarted from the beginning without creating duplicate records.
Scope assignment on bulk imports
All documents uploaded in a bulk import share the same scope unless you specify it per-document. For mixed-scope imports – for example, some documents are org-level and others are user-level – structure your upload loop to set scope per file:
Failed documents should be inspected individually to determine whether the failure is transient (pipeline overload) or permanent (corrupt file, unsupported format):