Storage and scaling

Object store

Spectron uses an object store to persist the raw bytes of uploaded documents. Text is extracted, chunked, and embedded asynchronously; only the processed content enters SurrealDB. The original bytes remain in the object store for re-processing if you change chunking or embedding configuration.

Supported backends

Backend	URL format
Amazon S3	`s3://bucket/prefix?region=us-east-1`
Google Cloud Storage	`gs://bucket/prefix`
Azure Blob Storage	`az://container/prefix`
Local filesystem	`file:///absolute/path`
S3-compatible (MinIO, Tigris, Cloudflare R2, etc.)	`s3://bucket/prefix?endpoint=https://minio.example.com&region=auto`

Set the backend via the SPECTRON_OBJECT_STORE_URL environment variable.

Amazon S3

SPECTRON_OBJECT_STORE_URL=s3://my-spectron-bucket/prod?region=eu-west-1

Credentials are resolved in the standard AWS credential chain: environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY), shared credentials file, or IAM instance roles. On EC2 or EKS with IRSA, Spectron picks up the instance or pod role automatically – no explicit credential configuration is required.

To use explicit credentials:

AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
SPECTRON_OBJECT_STORE_URL=s3://my-spectron-bucket/prod?region=eu-west-1

Attach a bucket policy that grants Spectron s3:GetObject, s3:PutObject, and s3:DeleteObject on your bucket.

Google Cloud Storage

SPECTRON_OBJECT_STORE_URL=gs://my-spectron-bucket/prod

Credentials are resolved via Application Default Credentials (ADC). On GKE with Workload Identity, the pod service account is used automatically. For local development, run gcloud auth application-default login.

Azure Blob Storage

SPECTRON_OBJECT_STORE_URL=az://my-spectron-container/prod
AZURE_STORAGE_ACCOUNT=mystorageaccount
AZURE_STORAGE_ACCESS_KEY=...

Local filesystem

SPECTRON_OBJECT_STORE_URL=file:///var/lib/docs/spectron/objects

Local filesystem storage is appropriate for development and single-server deployments. It is not suitable for multi-replica deployments because each replica would need access to the same filesystem path. Use a network filesystem (NFS, EFS) or switch to S3 or GCS before running more than one Spectron instance.

S3-compatible storage

MinIO, Tigris, Cloudflare R2, and other S3-compatible stores are supported via the endpoint query parameter:

# MinIO
SPECTRON_OBJECT_STORE_URL=s3://docs/spectron/prod?endpoint=http://minio:9000&region=us-east-1

# Cloudflare R2
SPECTRON_OBJECT_STORE_URL=s3://spectron-bucket/prod?endpoint=https://account.r2.cloudflarestorage.com&region=auto

Pair S3-compatible storage with explicit credential environment variables since these stores do not support IAM instance roles.

Scaling Spectron

Horizontal scaling

Spectron replicas share one SurrealDB cluster and object store. To increase throughput, add more api or worker pods behind a load balancer – no sticky sessions required.

In Kubernetes, increase spec.replicas in the Deployment manifest or configure an HPA. On bare metal, run multiple instances on different ports behind Nginx or HAProxy.

Job queue concurrency

Each spectrond worker role runs an internal extraction worker. The worker dequeues and processes jobs (text extraction, embedding, entity extraction) from the SurrealDB-backed queue. SurrealDB's record-level locking ensures each job is processed exactly once across all replicas.

To increase extraction throughput without adding full Spectron replicas, set SPECTRON_WORKER_CONCURRENCY to allow each instance to process multiple jobs in parallel:

SPECTRON_WORKER_CONCURRENCY=4  # default: 2

Raising this value increases LLM API call frequency and memory usage proportionally.

Performance tuning

Token limits

Set a per-Context token budget to avoid unbounded LLM spend:

curl -s -X PATCH https://spectron.example.com/api/v1/contexts/acme-prod \
  -H "Authorization: Bearer $SPECTRON_MGMT_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "config": { "token_limit": 5000000 } }'

The token_limit is a soft monthly cap for metering. Requests return 429 only when enforcement_blocked is set on the Context (or when a per-minute rate limit is hit), not from exceeding the soft cap alone.

Chunk size

The chunk size controls how documents are split before embedding. Smaller chunks improve retrieval precision; larger chunks reduce embedding cost. Set at the Context level:

{
  "config": {
    "ingestion": {
      "chunk_size": 512,
      "chunk_overlap": 64
    }
  }
}

Values are in tokens. The default chunk size is 512 with an overlap of 64.

HNSW parameters

Spectron uses SurrealDB's HNSW vector index for semantic search. The index parameters affect recall accuracy and index build time:

{
  "config": {
    "vector": {
      "hnsw_ef_construction": 200,
      "hnsw_m": 16
    }
  }
}

Higher ef_construction improves recall but increases memory usage and index build time. The defaults (ef_construction: 200, m: 16) are suitable for most deployments.

Model routing for cost

Configure cheaper models for high-frequency extraction and reserve expensive models for reflection:

{
  "config": {
    "models": {
      "extraction": "openai/gpt-4o-mini",
      "reflection": "openai/gpt-4o",
      "response": "openai/gpt-4o"
    }
  }
}

See Cost and rate limits for monitoring and controlling LLM spend.

Next steps

Architecture overview – component diagram
Kubernetes – production deployment with HPA
Cost and rate limits – token budgets and rate limiting