SurrealDB
SurrealDB Docs Logo

Enter a search query

JSONL Source Usage Guide

This page is a slimmed down version of its counterpart at the README documentation for the surreal-sync tool.

Overview

The JSONL source in surreal-sync allows you to import JSON Lines (JSONL) files into SurrealDB. Each JSONL file becomes a table in SurrealDB, and each line in the file becomes a document in that table.

JSONL source is particularly useful for:

  • Importing data from APIs that export in JSON (jq can be used to convert JSON to JSONL)
  • Migrating from document-based systems like Notion
  • Bulk loading structured JSON data, like configuration files or logs (even SurrealDB terminal output when set to JSON mode)
  • Converting references between documents into SurrealDB’s record links (RecordIds)

Basic Usage

surreal-sync sync jsonl \ --source-uri /path/to/jsonl/directory \ --to-namespace myns \ --to-database mydb

Prerequisites

Before using JSONL source, ensure you have:

  1. SurrealDB running locally or accessible via network
  2. surreal-sync built and available in your PATH

To start SurrealDB locally:

surreal start --user root --pass root

Example: Importing Notion-like Data

Let’s walk through an example using sample Notion-like data with pages, blocks, and databases.

Sample JSONL Files

Create a directory with the following JSONL files.

Note: each one of these has an id field, which is required in SurrealDB. To set another field as the id field, pass in the flag --id-field "field_name".

databases.jsonl:

{"id": "db1", "name": "Documentation", "description": "Main documentation database", "created_at": "2023-12-01T10:00:00Z", "properties": {"status": "active", "version": 1.0}}
{"id": "db2", "name": "API Docs", "description": "API reference documentation", "created_at": "2023-12-15T10:00:00Z", "properties": {"status": "beta", "version": 0.5}}

pages.jsonl:

{"id": "page1", "title": "Getting Started", "content": "Welcome to our documentation", "parent": {"type": "database_id", "database_id": "db1"}, "created_at": "2024-01-01T10:00:00Z"} {"id": "page2", "title": "Advanced Topics", "content": "Deep dive into advanced features", "parent": {"type": "page_id", "page_id": "page1"}, "created_at": "2024-01-02T10:00:00Z"} {"id": "page3", "title": "API Reference", "content": "Complete API documentation", "parent": {"type": "database_id", "database_id": "db2"}, "created_at": "2024-01-03T10:00:00Z"}

blocks.jsonl:

{"id": "block1", "type": "paragraph", "text": "This is a paragraph block", "parent": {"type": "page_id", "page_id": "page1"}, "order": 1} {"id": "block2", "type": "heading", "text": "Introduction", "level": 1, "parent": {"type": "page_id", "page_id": "page1"}, "order": 2} {"id": "block3", "type": "code", "text": "console.log('Hello World');", "language": "javascript", "parent": {"type": "page_id", "page_id": "page2"}, "order": 1} {"id": "block4", "type": "list", "items": ["Item 1", "Item 2", "Item 3"], "parent": {"type": "block_id", "block_id": "block2"}, "order": 3}

Running the Import

Use the following command to import the data with conversion rules for parent references:

surreal-sync sync jsonl \ --source-uri /workspace/tests/test_data/jsonl \ --to-namespace notion \ --to-database docs \ --surreal-endpoint http://surrealdb:8000 \ --surreal-username root \ --surreal-password root \ --rule 'type="database_id",database_id databases:database_id' \ --rule 'type="page_id",page_id pages:page_id' \ --rule 'type="block_id",block_id blocks:block_id'
Note

Replace http://surrealdb:8000 with http://localhost:8000 if running SurrealDB locally on your machine.

If successful, the command will complete without output. You can verify the import worked by checking the database as shown in the next section.

Understanding Conversion Rules

The --rule flag defines how to convert JSON objects into SurrealDB record links. The format is:

--rule 'type="TYPE_VALUE",ID_FIELD TARGET_TABLE:ID_FIELD'

For example:

  • type="page_id",page_id pages:page_id means:
    • When a JSON object has "type": "page_id"
    • Take the value from the page_id field
    • Convert it to a record link like pages:page1 if the page_id is page1.

Custom ID Fields

By default, surreal-sync looks for an id field in each JSON object. You can specify a different field name:

surreal-sync sync jsonl \ --source-uri /path/to/jsonl \ --to-namespace myns \ --to-database mydb \ --id-field "item_id"

Example JSONL with custom ID field:

{"item_id": "prod1", "name": "Widget", "price": 19.99}
{"item_id": "prod2", "name": "Gadget", "price": 29.99}

Advanced Options

Batch Size

Control how many records are processed at once:

--batch-size 500

Dry Run

Test the import without actually writing data:

--dry-run

Environment Variables

You can also use environment variables for configuration:

export SURREAL_ENDPOINT=http://localhost:8000 export SURREAL_USERNAME=root export SURREAL_PASSWORD=root export SOURCE_URI=/path/to/jsonl surreal-sync sync jsonl \ --to-namespace myns \ --to-database mydb

Tips and Best Practices

  1. File Naming: Name your JSONL files exactly as you want your SurrealDB tables to be named.

  2. ID Values: Ensure all documents have unique ID values within each file.

  3. Data Types: JSONL source preserves JSON data types:

    • Numbers remain as integers or floats
    • Strings remain as strings
    • Arrays and objects are preserved
    • Booleans remain as booleans
    • Null values are preserved
  4. References: Use conversion rules to maintain relationships between documents across different tables.

  5. Performance: For large datasets, adjust the batch size based on your system’s memory and SurrealDB’s capacity.

Troubleshooting

  1. Missing ID Field: If you see “Missing ID field” errors, ensure your JSON objects have the ID field (default: “id”) or specify the correct field with --id-field.

  2. Invalid Rule Format: Conversion rules must follow the exact format. Check for proper quoting and spacing.

  3. File Not Found: Ensure the source URI points to a directory containing .jsonl files, not individual files.

  4. Connection Issues: Verify SurrealDB is running and accessible at the specified endpoint.

Edit this page on GitHub