Object store
Spectron uses an object store to persist the raw bytes of uploaded documents. Text is extracted, chunked, and embedded asynchronously; only the processed content enters SurrealDB. The original bytes remain in the object store for re-processing if you change chunking or embedding configuration.
Supported backends
| Backend | URL format |
|---|---|
| Amazon S3 | s3://bucket/prefix?region=us-east-1 |
| Google Cloud Storage | gs://bucket/prefix |
| Azure Blob Storage | az://container/prefix |
| Local filesystem | file:///absolute/path |
| S3-compatible (MinIO, Tigris, Cloudflare R2, etc.) | s3://bucket/prefix?endpoint=https://minio.example.com®ion=auto |
Set the backend via the SPECTRON_OBJECT_STORE_URL environment variable.
Amazon S3
Credentials are resolved in the standard AWS credential chain: environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY), shared credentials file, or IAM instance roles. On EC2 or EKS with IRSA, Spectron picks up the instance or pod role automatically – no explicit credential configuration is required.
To use explicit credentials:
Attach a bucket policy that grants Spectron s3:GetObject, s3:PutObject, and s3:DeleteObject on your bucket.
Google Cloud Storage
Credentials are resolved via Application Default Credentials (ADC). On GKE with Workload Identity, the pod service account is used automatically. For local development, run gcloud auth application-default login.
Azure Blob Storage
Local filesystem
Local filesystem storage is appropriate for development and single-server deployments. It is not suitable for multi-replica deployments because each replica would need access to the same filesystem path. Use a network filesystem (NFS, EFS) or switch to S3 or GCS before running more than one Spectron instance.
S3-compatible storage
MinIO, Tigris, Cloudflare R2, and other S3-compatible stores are supported via the endpoint query parameter:
Pair S3-compatible storage with explicit credential environment variables since these stores do not support IAM instance roles.
Scaling Spectron
Horizontal scaling
Spectron replicas share one SurrealDB cluster and object store. To increase throughput, add more api or worker pods behind a load balancer – no sticky sessions required.
In Kubernetes, increase spec.replicas in the Deployment manifest or configure an HPA. On bare metal, run multiple instances on different ports behind Nginx or HAProxy.
Job queue concurrency
Each spectrond worker role runs an internal extraction worker. The worker dequeues and processes jobs (text extraction, embedding, entity extraction) from the SurrealDB-backed queue. SurrealDB's record-level locking ensures each job is processed exactly once across all replicas.
To increase extraction throughput without adding full Spectron replicas, set SPECTRON_WORKER_CONCURRENCY to allow each instance to process multiple jobs in parallel:
Raising this value increases LLM API call frequency and memory usage proportionally.
Performance tuning
Token limits
Set a per-Context token budget to avoid unbounded LLM spend:
The token_limit is a monthly cap across all model stages. When the cap is reached, Spectron returns 429 Too Many Requests for operations that would require LLM calls.
Chunk size
The chunk size controls how documents are split before embedding. Smaller chunks improve retrieval precision; larger chunks reduce embedding cost. Set at the Context level:
Values are in tokens. The default chunk size is 512 with an overlap of 64.
HNSW parameters
Spectron uses SurrealDB's HNSW vector index for semantic search. The index parameters affect recall accuracy and index build time:
Higher ef_construction improves recall but increases memory usage and index build time. The defaults (ef_construction: 200, m: 16) are suitable for most deployments.
Model routing for cost
Configure cheaper models for high-frequency extraction and reserve expensive models for reflection:
See Cost and rate limits for monitoring and controlling LLM spend.
Next steps
Architecture overview – component diagram
Kubernetes – production deployment with HPA
Cost and rate limits – token budgets and rate limiting