Get tutorials, AI agent recipes, webinars, and early product updates in your inbox every two weeks
import VideoPlayer from '@src/components/shared/VideoPlayer.astro';
SurrealDB: shiny newcomer, low-level database, or both?
SurrealDB is admittedly a pretty shiny database. It's built in Rust, a programming language that has just barely hit the 10-year mark since it hit version 1.0. A good deal of attention is paid to the website and brand design, and the official Surrealist app is not only slick but even capable of visualising data as an interconnected net of records.
This blog post goes into further detail on how this graph visualisation works, leading to output like the following that lets you see which European countries belong to which intranational groups.
But the other side of being a database written in a low-level language like Rust is that you can compile it to machine code with a single command and do everything you want with it on the command line, and thus on small devices on the edge as well. The main way to work with SurrealDB on the terminal is with the surreal sql command to open up a REPL, but building a TUI (terminal user interface) is another way to make it happen. That's what this post is about.
This post features the same AI-native demo app as in a previous post, except that this time it is built using Ratatui, the main way to build a TUI using Rust.
You can read more about the demo app functionality in the previous post, but here is a quick recap of what it does. It lets you:
Automatically go through the article summary to try to link to other documents (e.g. if it sees the word "Canada" in the summary, it will try to link to a document:Canada)
See the linked articles from a single article recursively down to a depth of 3
Perform semantic (vector) search via OpenAI or Mistral embeddings to find the closest articles to a sample of text (requires a key to use as it must call in to OpenAI and/or Mistral to first retrieve the embeddings)
Query the database.
You can see what the final product will look like in the following video.
It also comes with a new feature that is so new that it was merged into SurrealDB just two days ago and is only available in nightly, called hybrid search. We'll look at how that works in the next section.
Designing a TUI
Ratatui has a showcase here that demonstrates the range of design that can be done in a terminal UI. With a good amount of work you can generate graphics with them, such as this Minesweeper game. One core developer on the Ratatui project has even made a guitar tuner and realtime fretboard display out of it! But most of the time a terminal UI will be a relatively simple app made up of a few rectangles and some code to react to the user's keystrokes.
This is the case with this app, with the exception of a bit of some official SurrealDB colours.
Ratatui allows you to generate your own colours if you know their labellings such as RGB or hex. You can see the names of the SurrealDB brand colours and their values on this page. However, not all terminals are able to display True Colour, and since colour is the way this app shows which window is currently selected, we don't want to take a chance by going with Colour::Rgb(u8, u8, u8).
Fortunately, Ratatui allows 8-bit colour values to be specified and these are usually close enough. That gives us the following values, but you can change them to Color::Rgb if your terminal allows it.
constSURREAL_PINK: Color=Color::Indexed(199); constSURREAL_PURPLE: Color=Color::Indexed(93); constLAVENDER_MIST: Color=Color::Indexed(235); constMOONLIT_WHITE: Color=Color::Indexed(15); // These two are similar to each other so their // 8-bit values end up being the same constDEEP_AMETHYST: Color=Color::Indexed(233); constOBSIDIAN_VIOLET: Color=Color::Indexed(233);
That gives us a selection of colours a bit different than the True Colour values, but close enough for a terminal app.
How does hybrid search work?
The PR that adds hybrid search to SurrealDB does a great job at explaining how it works. While vector search works well for lexical similarity, and full-text (lexical) search is optimised for precise results, sometimes you might want to combine the two to obtain a single similarity score.
The question then arises: how should these be weighted? If you have a certain vector similarity score, how could this be combined with the similarity or distance results from a full-text search?
This was precisely the reason why the previous blog post didn't have hybrid search, as I couldn't think of a way to combine the two that wasn't arbitrary. Fortunately, it turns out that algorithms already exist to combine the two. Let's read the description from the PR on what hybrid search does:
Users frequently need “hybrid” retrieval: lexical search (BM25) gives precise keyword matches, while vector search captures semantic similarity even when keywords don’t overlap.
Today SurrealDB exposes both capabilities separately, but there is no built‑in way to combine them, forcing application code to stitch results together.
Adding an RRF‑based fusion primitive lets users get higher recall and better relevance with a single query, without manual score normalization.
Reciprocal Rank Fusion is simple, robust across heterogeneous scorers, and can be implemented on top of existing full‑text and vector index mechanisms, minimizing complexity while unlocking a common production use case.
Hybrid search in SurrealDB is done using three main ways:
search::rrf(): reciprocal rank fusion
search::linear() using 'minmax' normalisation
seach::linear() using 'zscore' normalisation
The search::rrf() function is the simplest method, because all it looks for is an array with the results of vector search and full-text search to stitch together, followed by the number of results to return, and an optional k constant that is used to smooth the ranking defaults to 60.
search::rrf([$vs, $ft], 4, 60)
The other two ways to use hybrid search have an extra parameter to set a weighting such as [2, 1] (giving vector search twice the weight of FTS) along with the choice between 'minmax' and 'zscore'. We won't use linear weighting in this post, but here is what these functions look like for reference.
-- Fuse with Linear / minmax search::linear([$vs, $ft], [2, 1], 4, 'minmax');
-- Fuse with Linear / zscore search::linear([$vs, $ft], [2, 1], 4, 'zscore');
Now, there is a good amount of setup required before using search::rrf(): one query to find the four nearest semantic neighbours, and another to get all of the full-text query matches. For more on these queries, see this section in the previous blog.
letvs=self.surreal.query("SELECT id, extract, title, vector::distance::knn() AS distance FROM document WHERE {field} <|4,COSINE|> $embeds ORDER BY distance; ").bind(("embeds",embeds)) .bind(("field",field)) .await.unwrap().take::<Value>(0).unwrap();
letft=self.surreal.query("SELECT id, search::highlight('**', '**', 0) AS title, search::highlight('**', '**', 1) AS extract, (search::score(0) * 3) + search::score(1) AS score FROM document WHERE title @0@ $input OR extract @1@ $input ORDER BY score DESC;") .bind(("input",fts_query)) .await.unwrap().take::<Value>(0).unwrap();
Ok(self.surreal .query("SELECT id, rrf_score FROM search::rrf([$vs, $ft], 4, 60) ORDER BY rrf_score DESC;") .bind(("vs",vs)) .bind(("ft",ft)).await.unwrap().take::<Value>(0).unwrap().to_string())
Two more points to note about these functions are:
They require the inputs to have an id field, as this is used to stitch them together. So make sure to include id in both the vector query and full-text query.
They won't return an error if you don't have
Let's give this a try in the app by doing the following:
First add some articles to the database: the article for Canada, its two largest provinces, and their two largest cities.
Then search for Canada,popul city. That will do a hybrid search for documents that are a close neighbour to document:Canada and also have a match for "populous city" on the full-text index. Using edgengram(3,10) in the search index means that popul will match a large number of words such as "populous" and "population".
The interesting thing about the output is that it will usually be the same, whether OpenAI or Mistral embeddings are used. The FTS here in addition to the hybrid search output truly does flatten the results.
In contrast, a vector-only search returns more varying results including not just distance metrics but also the order in which they are returned. OpenAI considers Ottawa to be the second-most important article after Ontario when it comes to the keyword "Canada":
[{ distance: 0.09790307035147372f, extract: "Ontario is the southernmost province of Canada...", title: 'Ontario' }, { distance: 0.11906508835503704f, extract: 'Ottawa is the capital city of Canada....', title: 'Ottawa' }]
While Mistral has put Quebec in second place.
[{ distance: 0.36376299190450023f, extract: "Ontario is the southernmost province of Canada...", title: 'Ontario' }, { distance: 0.36566584443638417f, extract: "Quebec is Canada's largest province by area...", title: 'Quebec' }]
As you can see, working with raw embedding output tends to be more dynamic, while a hybrid search tends to be flatter and with less variance.
To see more of a difference when using an RFF query, try using A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,Alphabet,Latin alphabet to insert a bunch of articles about the alphabet. If you do a search for something vague like A,alphabet inside the app then you will finally see some variation in the results.
The document:U part is interesting. Let's try a raw query to see why it's showing up. This is all we need to see the content:
document:U.extract
There it is! U just happens to be the twenty-<u>first</u> letter of the alphabet.
'U, or u, is the twenty-first letter and the fifth vowel letter of the Latin alphabet, used in the modern English alphabet and the alphabets of other western European languages and others worldwide. Its name in English is u, plural ues.'
By the way, u's is a perfectly acceptable way to pluralize the letter u. Don't believe everything you read online!
Plurals almost never take an apostrophe. Chicago style uses an apostrophe for the plural of lowercase single letters (x’s and o’s), but for little else (for instance, we write “dos and don’ts”).
As you can see, playing around with an app of this sort can take you into some pretty interesting corners of the world.
On that note, let's finish up with the source code for the app and how to run it.
Final notes and app source code
Assuming you have Rust installed, the app can be run with a single command. Without an OPENAI_API_KEY and MISTRAL_API_KEY embedded retrieval and search won't work, but everything else will.
OPENAI_API_KEY=yourkeyhere MISTRAL_API_KEY=yourkeyhere cargo run
To use the search::rrf() function until the next version of SurrealDB is out, you can download the nightly version of SurrealDB, run that and change connect("memory") to connect("ws://localhost:8000"), or just be lazy by cloning the SurrealDB repo and adding it to Cargo.toml as follows.
Esc: delete text in current square Enter: Run command, or copy Output content to clipboard Shift + up/down: Scroll through Output content Ctrl+C: Close app";
fninit()->&'staticstr{ r#" DEFINE NAMESPACE ns; DEFINE DATABASE db; USE NS ns; USE DB db; DEFINE FIELD extract ON document TYPE string; DEFINE FIELD title ON document TYPE string; DEFINE FIELD mistral_embedding ON document TYPE option<array<float>> DEFAULT []; DEFINE FIELD openai_embedding ON document TYPE option<array<float>> DEFAULT []; DEFINE ANALYZER en_analyzer TOKENIZERS class FILTERS lowercase,edgengram(3,10); DEFINE INDEX en_extract ON document FIELDS extract SEARCH ANALYZER en_analyzer BM25 HIGHLIGHTS; DEFINE INDEX en_title ON document FIELDS title SEARCH ANALYZER en_analyzer BM25 HIGHLIGHTS;
DEFINE TABLE link TYPE RELATION IN document OUT document ENFORCED;
DEFINE INDEX only_one_link ON link FIELDS in,out UNIQUE;"# }
letmutcurrent_doc=self .surreal // Grab just the embeds field from a document .query(format!("type::thing('document', '{doc}').{field_name};")) .await?; letembeds: Value=current_doc.take(0)?;
letmutsimilar=self .surreal .query(format!( "(SELECT extract, title, vector::distance::knn() AS distance FROM document WHERE {field_name} <|4,COSINE|> $embeds ORDER BY distance).filter(|$t| $t.distance > 0.0001);", )) .bind(("embeds",embeds)) .await?; similar.take::<Value>(0) }
asyncfnrrf(&self,input: String,field: &str)->Result<String,String> { letinput=input.trim().to_string(); letfield=field.to_string(); letSome((doc,fts_query))=input.split_once(',')else{ returnErr("Must enter a document title followed by a comma and text to perform full-text search on".to_string()); };
letfts_query=fts_query.to_string(); letdoc=RecordId::from_table_key("document",doc); letmutcurrent_doc=self .surreal // Grab just the embeds field from a document .query(format!("SELECT VALUE {field} FROM ONLY {doc}")) .bind(("doc",doc.clone())) .await .map_err(|e| e.to_string())?;
letembeds: Value=current_doc.take(0).unwrap();
letvs=self .surreal .query( "SELECT id, extract, title, vector::distance::knn() AS distance FROM document WHERE {field} <|4,COSINE|> $embeds ORDER BY distance; ", ) .bind(("embeds",embeds)) .bind(("field",field)) .await .map_err(|e| e.to_string())? .take::<Value>(0) .map_err(|e| e.to_string())?;
letft=self .surreal .query( "SELECT id, search::highlight('**', '**', 0) AS title, search::highlight('**', '**', 1) AS extract, (search::score(0) * 3) + search::score(1) AS score FROM document WHERE title @0@ $input OR extract @1@ $input ORDER BY score DESC;", ) .bind(("input",fts_query)) .await .map_err(|e| e.to_string())? .take::<Value>(0) .map_err(|e| e.to_string())?;
Ok(self.surreal .query("(SELECT id, rrf_score FROM search::rrf([$vs, $ft], 5, 60) ORDER BY rrf_score DESC).filter(|$v| $v.id != {doc})") .bind(("vs",vs)) .bind(("ft",ft)) .bind(("doc",doc)) .await.map_err(|e| e.to_string())?.take::<Value>(0).map_err(|e| e.to_string())?.to_string()) }
asyncfninsert_documents(&self,page_names: String)->Result<String,String> { // Get each page name separated by a comma and add _ in between words letpage_names=page_names .trim() .split(",") .map(|p| p.split_whitespace().collect::<Vec<&str>>().join("_")); letmutresult=String::new();
forresinresults{ lets=res.join().unwrap_or("Couldn't join thread".to_string()); letmutcontent: PageContent=matchserde_json::from_str(&s){ Ok(content)=>content, Err(_)=>returnErr(s), }; // Add an underscore again as response from Wikipedia won't have it content.title=content .title .split_whitespace() .collect::<Vec<&str>>() .join("_"); result.push('\n'); result.push_str(&self.add_document(content).await); } Ok(result) }
letres=self .surreal .query("CREATE ONLY $doc SET title = $title, extract = $extract;") .bind(("doc",doc)) .bind(("title",content.title)) .bind(("extract",content.extract)); matchres.await{ Ok(mutr)=>matchr.take::<Option<PageContent>>(0){ Ok(Some(good))=>format!("{good:?}"), Ok(None)=>"No PageContent found".to_string(), Err(e)=>e.to_string(), }, Err(e)=>e.to_string(), } }
asyncfnlink_documents(&self,documents: String)->String{ letSome((one,two))=documents.split_once(",")else{ return"Please insert two document names separated by a comma".to_string(); }; letone=RecordId::from_table_key("document",one); lettwo=RecordId::from_table_key("document",two);
asyncfnsee_all_document_titles(&self)->String{ letres=self .raw_query("(SELECT VALUE title FROM document).sort()") .await; format!("All database article titles: {res}") }
asyncfnfull_text_search(&self,input: String)->Result<String,String> { matchself .surreal .query( "SELECT search::highlight('**', '**', 0) AS title, search::highlight('**', '**', 1) AS extract, (search::score(0) * 3) + search::score(1) AS score FROM document WHERE title @0@ $input OR extract @1@ $input ORDER BY score DESC;", ) .bind(("input",input)) .await { Ok(mutres)=>Ok(res.take::<Value>(0).map_err(|e| e.to_string())?.to_string()), Err(e)=>Err(e.to_string()), } }
terminal .draw(|frame| { // First split into four: buttons you don't type anything in, top buttons, query button, output let[non_type,top,run_query,output]=Layout::vertical([ Percentage(12), Percentage(38), Percentage(12), Percentage(38), ]) .areas(frame.area()); let[top_left,top_right]=Layout::horizontal([Min(3);2]).areas(top);