Using SurrealDB as a Full Text Search Database

A full-text search database is designed to index and retrieve text-based data (like articles, messages, or comments) based on the meaning and structure of the text itself, rather than exact, literal matches. This allows you to:

Find documents containing certain keywords.
Search for phrases or words with variants (e.g., “run,” “runs,” “running”).
Rank results by relevance, not just by literal string matches.

Traditional relational databases have some limited text search capabilities (with varying degrees of support for text indexes), but more specialized systems—like Elasticsearch or Lucene-based solutions—excel at performing complex, scalable full-text queries.

As a new multi-model database, SurrealDB has developed integrated full-text search capabilities so that you can store your data (as documents, graphs, or tables) and query it with advanced text search features. This guide will explain how to “think” in a full-text search model and show how SurrealDB helps you implement these concepts seamlessly.

Core Concepts of Full-Text Search

Analyzers: In a robust full-text search system, analyzers define how the text is processed. An analyzer typically includes tokenizers (which split text) and filters (which modify tokens). Different languages or data types need different analyzers.
Tokenization: Splits text into smaller units (“tokens”). Depending on your use case, tokens may be entire words, word stems, or even n-grams. For example, tokenizing the sentence “The quick brown fox.” might produce [“the”, “quick”, “brown”, “fox”].
Filtering: After splitting text into tokens, these tokens can be converted to lowercase, stripped of punctuation, or transformed (e.g., removing accents). Additional filters can include stemming or lemmatization, which reduce words to a base form (“running” -> “run”).
Indexing: Full-text search engines build specialized data structures (inverted indexes, suffix arrays, etc.) to let you quickly locate documents containing certain tokens. These indexes often store frequency, position, and other details that enable relevance scoring.
Ranking / Scoring: Once matches are found, an FTS engine ranks them to show the most relevant results first. Algorithms such as BM25 or TF-IDF look at how often terms appear in a document, or whether those terms appear in the title vs. the body, etc.
Highlighting : A good search experience shows where in the text the matches occur, often by wrapping matched terms in HTML tags or otherwise emphasizing them.

How Full-Text Search Differs from Keyword Matching

In traditional databases, you might do something like:

SELECT * FROM articles WHERE 'fox' IN title;

This approach:

Doesn’t rank results by relevance—just returns every article containing “fox.”
Ignores language variations, e.g., “Foxes,” “FoX,” or synonyms like “vixen.”
Tends to scan entire columns, making it slower for large datasets.

Full-text search, by contrast, uses an inverted index or other specialized structures for fast lookups and can handle a variety of linguistic transformations. It can highlight results and rank them by how relevant or frequent the terms are.

SurrealDB as a Full-Text Search Database

SurrealDB combines multi-model data storage (document, graph, vector, etc.) with an integrated full-text search engine. This means you can:

Store your text in a SurrealDB table or document.
Define analyzers that specify how SurrealDB will tokenize and filter your text.
Define indexes that use those analyzers for full-text search, optionally with advanced features like BM25 scoring and highlighting.

Advantages of SurrealDB for FTS

Unified Model: You can keep your data, relationships, and search logic in a single engine.
Flexible Schema: SurrealDB can be schemaless, so adding new fields or text columns doesn’t require schema migrations.
Powerful Query Language: SurrealQL blends SQL-like syntax with searching syntax (the @@ matching operator for FTS queries, advanced indexing features, etc.).
Real-Time Updates: SurrealDB can handle real-time changes, so newly inserted or updated text becomes searchable quickly.

Creating and Managing Full-Text Search in SurrealDB

Below is an example of how you might define a table for “articles,” define an analyzer that lowercases and removes accents, and create a full-text index on the “title” and “body” fields:

USE NAMESPACE myapp DB content;

-- Create a table for articles (schemaless or define fields explicitly)
CREATE TABLE articles SCHEMALESS;

-- Define a custom analyzer
DEFINE ANALYZER my_custom_analyzer
  TOKENIZERS class
  FILTERS lowercase, ascii;

-- Create a full-text search index
DEFINE INDEX ml_title 
  ON TABLE article 
  FIELDS title 
  SEARCH ANALYZER my_analyzer 
  HIGHLIGHTS BM25;

DEFINE INDEX ml_body 
  ON TABLE article 
  FIELDS body 
  SEARCH ANALYZER my_analyzer 
  HIGHLIGHTS BM25;

With this setup:

my_custom_analyzer splits text by Unicode class changes (letters, digits, punctuation) and then lowercases and removes accents, turning the string "Look! Full-Text-Search" for example into the final tokens ['look', '!', 'full', '-', 'text', '-', 'search' ].
SurrealDB uses these tokens from the analyzer to build an inverted index over both the title and body fields.
The BM25 scoring function is used to rank documents.
HIGHLIGHTS are enabled, which allows SurrealDB to highlight matched terms in queries.

Querying Full-Text Data

Assume you want to search for articles about “machine learning”, SurrealDB syntax might look like:

SELECT *,
  search::highlight("**", "**", 1) AS body,
  search::highlight("##", "", 0) AS title,
  search::score(0) + search::score(1) AS score
FROM article
WHERE title @0@ "machine"
   OR body @1@ "machine learning"
ORDER BY score DESC
LIMIT 10;

Explanation:

search::highlight(...): SurrealDB can highlight the matched terms in the text.
search::score() works with the BM25 ranking function to return the relevance score for the matched document.
@@ "machine learning": SurrealDB will check if the tokens “machine” and “learning” appear in the title field’s FTS index. The number in between the two characters of the operator lets the database know in the above functions which part of the query to highlight and calculate a score for. If no search functions are used, this operator will be used as @@ without a number (e.g. `WHERE title @@ “machine” OR body @@ “machine learning”).

Resources

Full Text Search reference guide, including comparison of Full Text Search to other methods
DEFINE INDEX
DEFINE ANALYZER
Search functions
- search::highlight: Highlights the matching keywords for the predicate reference number.
- search::offsets: Returns the position of the matching keywords for the predicate reference number.
- search::score: Helps with scoring and ranking the search results based on their relevance to the search terms.
- search::analyze: Used to test the output of a defined search analyzer.

Edit this page on GitHub

Vector Time Series

Using SurrealDB as a Full Text Search Database

Core Concepts of Full-Text Search

How Full-Text Search Differs from Keyword Matching

SurrealDB as a Full-Text Search Database

Advantages of SurrealDB for FTS

Creating and Managing Full-Text Search in SurrealDB

Querying Full-Text Data

Resources

On this page

Was this page helpful?

What did you like?

What went wrong?