Back to top
Documentation SurrealQL Statements DEFINE statement ANALYZER

DEFINE ANALYZER statement

In the context of a database, an analyzer plays a crucial role in text processing and searching. It is defined by its name, a set of tokenizers, and a collection of filters.

Requirements

Statement syntax

DEFINE ANALYZER @name [ TOKENIZERS @tokenizers ] [ FILTERS @filters ]

Tokenizers

Tokenizers are responsible for breaking down a given text into individual tokens.

  • blank: creates a new token each time a space, tab, or newline character is encountered.
  • camel: creates a new token when the next character is uppercase.
  • class: creates a new token when the Unicode class of the next character changes (digit, letter, punctuation, blank).
  • punct: creates a new token each time a punctuation character is encountered.

Filters

Filters take on the task of transforming these tokens for further processing and analysis.

  • ascii: replaces or removes diacritical marks.
  • edgengram: useful for finding a term by its prefix.
  • lowercase: converts the token to lowercase.
  • snowball: applies snowball stemming to the token. The following languages are supported: Arabic, Danish, Dutch, English, French, German, Greek, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil, Turkish.
  • uppercase: converts the token to uppercase.

Example usage

This example creates an analyzer that splits tokens on blank characters and removes diacritical marks.

-- Creates a simple analyzer removing diacritics marks
DEFINE ANALYZER ascii TOKENIZERS class FILTERS lowercase,ascii;

This command statement creates an analyzer specifically designed for processing English texts.

-- Creates an analyzer suitable for English text
DEFINE ANALYZER english TOKENIZERS class FILTERS snowball(english);

This statement creates an analyzer specifically designed for auto-completion tasks.

-- Creates an analyzer suitable for auto-completion.
DEFINE ANALYZER autocomplete FILTERS lowercase,edgengram(2,10);

This command statement creates an analyzer specifically designed for source code analysis.

-- Creates an analyzer suitable for source code analysis.
DEFINE ANALYZER code TOKENIZERS class,camel FILTERS lowercase,ascii;