Search functions
These functions are used in conjunction with the @@ operator (the ‘matches’ operator) to either collect the relevance score or highlight the searched keywords within the content.
Before SurrealDB version 3.0.0-beta, the FULLTEXT ANALYZER clause used the syntax SEARCH ANALYZER.
The examples below assume the following queries:
CREATE book:1 SET title = "Rust Web Programming";
DEFINE ANALYZER book_analyzer TOKENIZERS blank, class, camel, punct FILTERS snowball(english);
DEFINE INDEX book_title ON book FIELDS title FULLTEXT ANALYZER book_analyzer BM25;
search::analyze
The search_analyze function returns the outut of a defined search analyzer on an input string.
API DEFINITION
search::analyze($analyzer: string, $input: string) -> array<string>
First define the analyzer using the DEFINE ANALYZER statement
Define book analyzer
DEFINE ANALYZER book_analyzer TOKENIZERS blank, class, camel, punct FILTERS snowball(english);
Next you can pass the analyzer to the search::analyzefunction. The following example shows this function, and its output, when used in a RETURN statement:
RETURN search::analyze("book_analyzer", "A hands-on guide to developing, packaging, and deploying fully functional Rust web applications");
Output
[
'a',
'hand',
'-',
'on',
'guid',
'to',
'develop',
',',
'packag',
',',
'and',
'deploy',
'fulli',
'function',
'rust',
'web',
'applic'
]
search::highlight
The search::highlight function highlights the matching keywords for the predicate reference number.
API DEFINITION
search::highlight($prepend: string, $append: string, $predicate: number, $highlight_all: option<bool>) -> string | string[]
The following example shows this function, and its output, when used in a RETURN statement:
SELECT id, search::highlight('<b>', '</b>', 1) AS title
FROM book WHERE title @1@ 'rust web';
Output
[
{
id: book:1,
title: [ '<b>Rust</b> <b>Web</b> Programming' ]
}
]
The optional Boolean parameter can be set to true to explicitly request that the whole found term be highlighted, or set to false to highlight only the sequence of characters we are looking for. This must be used with an edgengram or ngram filter. The default value is true.
search::linear
API DEFINITION
search::linear($lists: array, $weights: array, $limit: int, $norm: 'minmax' | 'zscore') -> array<object>
Notes on the arguments and output of this function:
- Input:
lists - array of result arrays. Each inner array must be pre‑sorted most‑relevant‑first (BM25 score descending, distance ascending already inverted, etc.).weights - An array of numeric weights corresponding to each result(must have same length as results)limit - Maximum number of documents to return (must be ≥ 1)norm - Normalization method: “minmax” for MinMax normalization or “zscore” for Z-score normalization
- Processing:
- Computes the union of all candidate ids.
- The function automatically extracts scores from documents using the following priority:
distance field - converted using 1.0 / (1.0 + distance) (lower distance = higher score)ft_score field - used directly (full-text search scores)score field - used directly (generic scores)- Rank-based fallback -
1.0 / (1.0 + rank) if no score field is found
- Normalization Methods:
- MinMax: Scales scores to [0,1] range using
(score - min) / (max - min) - Z-score: Standardizes scores using
(score - mean) / std_dev
- When merging field data from the per‑list rows, keeps the first non‑null value encountered in the order the lists were supplied, or the last one if there are several fields with the same key.
- Sorts by
linear_score descending and truncates to limit.
- Output:
- Array of merged result objects, each containing original fields and an added fuse_score.
CREATE test:1 SET text = "Graph databases are great.", embedding = [0.10, 0.20, 0.30];
CREATE test:2 SET text = "Relational databases store tables.", embedding = [0.05, 0.10, 0.00];
CREATE test:3 SET text = "This document mentions graphs and networks.", embedding = [0.20, 0.10, 0.25];
DEFINE ANALYZER simple TOKENIZERS class, punct FILTERS lowercase, ascii;
DEFINE INDEX idx_text ON TABLE test FIELDS text FULLTEXT ANALYZER simple BM25;
DEFINE INDEX idx_embedding ON TABLE test FIELDS embedding HNSW DIMENSION 3 DIST COSINE;
LET $qvec = [0.12, 0.18, 0.27];
LET $vs = SELECT id FROM test WHERE embedding <|2,100|> $qvec;
LET $ft = SELECT id, search::score(1) as score FROM test
WHERE text @1@ 'graph' ORDER BY score DESC LIMIT 2;
search::linear([$vs, $ft], [2, 1], 2, 'minmax');
search::linear([$vs, $ft], [2, 1], 2, 'zscore');
Output of the final search::linear() queries:
[
{
ft_score: 0.5366538763046265f,
id: test:1,
linear_score: 2
},
{
id: test:3,
linear_score: 0
}
]
[
{
score: 0.5366538763046265f,
id: test:1,
linear_score: 1.9999999999999956f
},
{
id: test:3,
linear_score: -2.0000000000000044f
}
]
search::offsets
The search::offsets function returns the position of the matching keywords for the predicate reference number.
API DEFINITION
search::offsets($predicate: number, $highlight_all: option<bool>) -> object
The following example shows this function, and its output, when used in a RETURN statement:
SELECT id, title, search::offsets(1) AS title_offsets
FROM book WHERE title @1@ 'rust web';
Output
[
{
id: book:1,
title: [ 'Rust Web Programming' ],
title_offsets: {
0: [
{ e: 4, s: 0 },
{ e: 8, s: 5 }
]
}
}
]
The output returns the start s and end e positions of each matched term found within the original field.
The full-text index is capable of indexing both single strings and arrays of strings. In this example, the key 0 indicates that we’re highlighting the first string within the title field, which contains an array of strings.
The optional boolean parameter can be set to true to explicitly request that the whole found term be highlighted, or set to false to highlight only the sequence of characters we are looking for. This must be used with an edgengram or ngram filter.
The default value is true.
search::rrf
API DEFINITION
search::rrf($lists: array, $limit: int, $k: option<int>) -> array<object>
Notes on the arguments and output of this function:
- Input:
- lists: array of result arrays. Each inner array must be pre‑sorted most‑relevant‑first (BM25 score descending, distance ascending already inverted, etc.).
- limit: maximum number of fused results to return.
- k (optional): RRF constant; defaults to 60.
See this paper for why 60 tends to be the default k value:
Our intuition in choosing this formula derived from fact that while highly-ranked documents are more important, the importance of lower-ranked documents does not vanish as it would were, say, an exponential function used. The constant k mitigates the impact of high rankings by outlier systems.
- Processing:
- Computes the union of all candidate ids.
- For each candidate, derives its rank in each list and computes
rff_score = Σ 1/(k + rank). - When merging field data from the per‑list rows, keeps the first non‑null value encountered in the order the lists were supplied, or the last one if there are several fields with the same key.
- Sorts by
rff_score descending and truncates to limit.
- Output:
- Array of merged result objects, each containing original fields and an added fuse_score.
CREATE test:1 SET text = "Graph databases are great.", embedding = [0.10, 0.20, 0.30];
CREATE test:2 SET text = "Relational databases store tables.", embedding = [0.05, 0.10, 0.00];
CREATE test:3 SET text = "This document mentions graphs.", embedding = [0.20, 0.10, 0.25];
DEFINE ANALYZER simple TOKENIZERS class, punct FILTERS lowercase, ascii;
DEFINE INDEX idx_text ON TABLE test FIELDS text FULLTEXT ANALYZER simple BM25;
DEFINE INDEX idx_embedding ON TABLE test FIELDS embedding HNSW DIMENSION 3 DIST COSINE;
LET $qvec = [0.12, 0.18, 0.27];
LET $vs = SELECT id FROM test WHERE embedding <|2,100|> $qvec;
LET $ft = SELECT id, search::score(1) as score FROM test
WHERE text @1@ 'graph' ORDER BY score DESC LIMIT 2;
search::rrf([$vs, $ft], 2, 60);
Output of the final search::rrf() query:
[
{
score: 0.5366538763046265f,
id: test:1,
rrf_score: 0.03278688524590164f
},
{
id: test:3,
rrf_score: 0.016129032258064516f
}
];
search::score
The search::score function returns the relevance score corresponding to the given ‘matches’ predicate reference numbers.
API DEFINITION
search::score(number) -> number
The following example shows this function, and its output, when used in a RETURN statement:
SELECT id, title, search::score(1) AS score FROM book
WHERE title @1@ 'rust web'
ORDER BY score DESC;
Output
[
{
id: book:1,
score: 0.9227996468544006,
title: [ 'Rust Web Programming' ],
}
]