A full-text search database is designed to index and retrieve text-based data (like articles, messages, or comments) based on the meaning and structure of the text itself, rather than exact, literal matches. This allows you to:
Traditional relational databases have some limited text search capabilities (with varying degrees of support for text indexes), but more specialized systems—like Elasticsearch or Lucene-based solutions—excel at performing complex, scalable full-text queries.
As a new multi-model database, SurrealDB has developed integrated full-text search capabilities so that you can store your data (as documents, graphs, or tables) and query it with advanced text search features. This guide will explain how to “think” in a full-text search model and show how SurrealDB helps you implement these concepts seamlessly.
Analyzers: In a robust full-text search system, analyzers define how the text is processed. An analyzer typically includes tokenizers (which split text) and filters (which modify tokens). Different languages or data types need different analyzers.
Tokenization: Splits text into smaller units (“tokens”). Depending on your use case, tokens may be entire words, word stems, or even n-grams. For example, tokenizing the sentence “The quick brown fox.” might produce [“the”, “quick”, “brown”, “fox”].
Filtering: After splitting text into tokens, these tokens can be converted to lowercase, stripped of punctuation, or transformed (e.g., removing accents). Additional filters can include stemming or lemmatization, which reduce words to a base form (“running” -> “run”).
Indexing: Full-text search engines build specialized data structures (inverted indexes, suffix arrays, etc.) to let you quickly locate documents containing certain tokens. These indexes often store frequency, position, and other details that enable relevance scoring.
Ranking / Scoring: Once matches are found, an FTS engine ranks them to show the most relevant results first. Algorithms such as BM25 or TF-IDF look at how often terms appear in a document, or whether those terms appear in the title vs. the body, etc.
Highlighting : A good search experience shows where in the text the matches occur, often by wrapping matched terms in HTML tags or otherwise emphasizing them.
In traditional databases, you might do something like:
SELECT * FROM articles WHERE 'fox' IN title;
This approach:
Full-text search, by contrast, uses an inverted index or other specialized structures for fast lookups and can handle a variety of linguistic transformations. It can highlight results and rank them by how relevant or frequent the terms are.
SurrealDB combines multi-model data storage (document, graph, vector, etc.) with an integrated full-text search engine. This means you can:
@@
matching operator for FTS queries, advanced indexing features, etc.).Below is an example of how you might define a table for “articles,” define an analyzer that lowercases and removes accents, and create a full-text index on the “title” and “body” fields:
USE NAMESPACE myapp DB content; -- Create a table for articles (schemaless or define fields explicitly) CREATE TABLE articles SCHEMALESS; -- Define a custom analyzer DEFINE ANALYZER my_custom_analyzer TOKENIZERS class FILTERS lowercase, ascii; -- Create a full-text search index DEFINE INDEX ml_title ON TABLE article FIELDS title SEARCH ANALYZER my_analyzer HIGHLIGHTS BM25; DEFINE INDEX ml_body ON TABLE article FIELDS body SEARCH ANALYZER my_analyzer HIGHLIGHTS BM25;
With this setup:
my_custom_analyzer
splits text by Unicode class changes (letters, digits, punctuation) and then lowercases and removes accents, turning the string "Look! Full-Text-Search"
for example into the final tokens ['look', '!', 'full', '-', 'text', '-', 'search' ]
.HIGHLIGHTS
are enabled, which allows SurrealDB to highlight matched terms in queries.Assume you want to search for articles about “machine learning”, SurrealDB syntax might look like:
SELECT *, search::highlight("**", "**", 1) AS body, search::highlight("##", "", 0) AS title, search::score(0) + search::score(1) AS score FROM article WHERE title @0@ "machine" OR body @1@ "machine learning" ORDER BY score DESC LIMIT 10;
Explanation:
search::highlight(...)
: SurrealDB can highlight the matched terms in the text.search::score()
works with the BM25
ranking function to return the relevance score for the matched document.@@ "machine learning"
: SurrealDB will check if the tokens “machine” and “learning” appear in the title field’s FTS index. The number in between the two characters of the operator lets the database know in the above functions which part of the query to highlight and calculate a score for. If no search functions are used, this operator will be used as @@
without a number (e.g. `WHERE title @@ “machine” OR body @@ “machine learning”).DEFINE INDEX
DEFINE ANALYZER
search::highlight
: Highlights the matching keywords for the predicate reference number.search::offsets
: Returns the position of the matching keywords for the predicate reference number.search::score
: Helps with scoring and ranking the search results based on their relevance to the search terms.search::analyze
: Used to test the output of a defined search analyzer.