Full-text search does not compare your query string to the document byte for byte. Instead, SurrealDB tokenizer text into terms, optionally filters those terms (case folding, stemming, and more), and indexes what comes out.
If you are new to FTS in SurrealDB, read the overview first. This guide walks through analyzers from the ground up; exact grammar, every clause, and diagrams live under DEFINE ANALYZER.
What an analyzer does
Roughly, processing flows like this:
Optional
FUNCTION: transforms the raw input string once (for example normalising punctuation or stripping markup) via a user-defined function that accepts and returns astring.Tokenizers — split the string into tokens (words, symbols, or other chunks) using one or more built-in tokenizers.
Filters — transform each token (lowercase, strip accents, stem, n-grams, and so on).
The same analyzer is used when indexing and matching queries, so spending time here pays off for relevance and performance.
See the tokens before you index
Use search::analyze() to print the token array an analyzer would produce, which is ideal for experimentisation.
Start with the simplest split, whitespace-only tokenization:
Once you are happy with the tokens, you attach the analyzer name to a full-text index and query with @@ (covered on Search indexes and Scoring and ranking).
Step 1 — Choose how to split text (tokenizers)
Tokenizers answer: where are the boundaries between tokens? Some examples of tokenizers are blank, camel, and class.
Step 2 — Normalise and enrich tokens (filters)
Filters answer: what should each token look like before indexing?
Some examples of filters are ascii, snowball, and ngram.
Custom dictionaries with mapper(path)
The mapper(path) filter rewrites tokens using a tab-separated file: canonical form first, variant second, one pair per line. That supports lemmatisation beyond what stemming alone catches, or izenormalising arbitrary phrasesize (for example mapping multilingual error strings to a single code).
Point path at a file the server can read. Here is a very short example dictionary:
An analyzer making use of this dictionary can be defined as follows:
Next steps
Search indexes — attach
FULLTEXT ANALYZERto a field.Scoring and ranking —
@@, BM25,search::score, and highlights.Reference:
DEFINE ANALYZER,DEFINE INDEX, Search functions.
Updating or creating analyzers safely
To add an analyzer only if it is missing, or to replace an existing definition, use IF NOT EXISTS or OVERWRITE on DEFINE ANALYZER. Examples and caveats are in the statement reference.