Fastembed is a library that allows you to generate vector embeddings locally, without needing an API key or calling into an external service.
Fastembed uses the included ONNX runtime to run its embedding models, downloading the model once every time it is used for the first time.
Fastembed libraries are available for the following languages:
The following is an overview of most of the models available for Fastembed. General use cases are:
Fast general-purpose embeddings. Choose L6 for speed, L12 for quality. Ideal for semantic search, clustering, and similarity tasks.
A “quantized” model means that it is optimised for faster inference and lower memory usage, often with minimal quality loss.
Model name | Embedding size | Description |
---|---|---|
AllMiniLML6V2 | 384 | Sentence Transformer model, MiniLM-L6-v2 |
AllMiniLML6V2Q | 384 | Quantized Sentence Transformer model, MiniLM-L6-v2 |
AllMiniLML12V2 | 384 | Sentence Transformer model, MiniLM-L12-v2 |
AllMiniLML12V2Q | 384 | Quantized Sentence Transformer model, MiniLM-L12-v2 |
Used for dense retrieval and semantic similarity. BGESmallENV15 is optimized for speed and tends to be the default choice for many applications.
Model name | Embedding size | Description |
---|---|---|
BGEBaseENV15 | 768 | v1.5 release of the base English model |
BGEBaseENV15Q | 768 | Quantized v1.5 release of the base English model |
BGELargeENV15 | 1024 | v1.5 release of the large English model |
BGELargeENV15Q | 1024 | Quantized v1.5 release of the large English model |
BGESmallENV15 | 384 | v1.5 release of the fast and default English model |
BGESmallENV15Q | 384 | Quantized v1.5 release of the fast and default English model |
Used for large context window embeddings.
Optimized for long-context English text (8K tokens). v1.5 improves quality over v1.
Model name | Embedding size | Description |
---|---|---|
NomicEmbedTextV1 | 768 | 8192 context length english model |
NomicEmbedTextV15 | 768 | v1.5 release of the 8192 context length english model |
NomicEmbedTextV15Q | 768 | Quantized v1.5 release of the 8192 context length english model |
Used for paraphrase detection and multilingual similarity. Ideal for sentence equivalence and semantic matching tasks.
Model name | Embedding size | Description |
---|---|---|
ParaphraseMLMiniLML12V2 | 384 | Multi-lingual model |
ParaphraseMLMiniLML12V2Q | 384 | Quantized Multi-lingual model |
ParaphraseMLMpnetBaseV2 | 768 | Sentence-transformers model for tasks like clustering or semantic search, based on the MPNet architecture. |
Model name | Embedding size | Description |
---|---|---|
BGESmallZHV15 | 512 | v1.5 release of the small Chinese model |
BGELargeZHV15 | 1024 | v1.5 release of the large Chinese model |
Used for context-rich multilingual embeddings. Great for cross-language retrieval and nuanced contextual understanding.
Model name | Embedding size | Description |
---|---|---|
ModernBertEmbedLarge | 1024 | Large model of ModernBert Text Embeddings |
MultilingualE5Small | 384 | Small model of multilingual E5 Text Embeddings |
MultilingualE5Base | 768 | Base model of multilingual E5 Text Embeddings |
MultilingualE5Large | 1024 | Large model of multilingual E5 Text Embeddings |
Used for high-quality English/multilingual embeddings.
Model name | Embedding size | Description |
---|---|---|
MxbaiEmbedLargeV1 | 1024 | Large English embedding model from MixedBreed.ai |
MxbaiEmbedLargeV1Q | 1024 | Quantized Large English embedding model from MixedBreed.ai |
GTEBaseENV15 | 768 | Base multilingual embedding model from Alibaba |
GTEBaseENV15Q | 768 | Quantized base multilingual embedding model from Alibaba |
GTELargeENV15 | 1024 | Large multilingual embedding model from Alibaba |
GTELargeENV15Q | 1024 | Quantized large multilingual embedding model from Alibaba |
Use CLIP for image-text matching, Jina for code search and retrieval. JinaEmbeddingsV2BaseCode is optimised for embedding code snippets.
Model name | Embedding size | Description |
---|---|---|
ClipVitB32 | 512 | CLIP text encoder based on ViT-B/32 |
JinaEmbeddingsV2BaseCode | 768 | Jina embeddings v2 base code |
The following example in Rust demonstrates how SurrealDB can be used to store the embeddings from the default language model for a number of phrases, after which it can be prompted to return the three closest results to a certain prompt.
First add a few crates to Cargo.toml with the following command:
cargo add anyhow fastembed serde tokio surrealdb --features surrealdb/kv-mem
Then use the following code.
use anyhow::Error; use fastembed::{EmbeddingModel, InitOptions, TextEmbedding}; use serde::Serialize; use surrealdb::{ Surreal, Value, engine::any::{Any, connect}, }; const SCHEMA: &str = "DEFINE TABLE document; DEFINE FIELD text ON document TYPE string; DEFINE FIELD embedding ON document TYPE array<float>; // Uncomment this to use HNSW index, ensure that number after DIMENSION matches size of embedding // DEFINE INDEX hnsw_embed ON document FIELDS embedding HNSW DIMENSION 384 DIST COSINE"; const INSERT_QUERY: &str = "INSERT INTO document $docs"; const VECTOR_QUERY: &str = "SELECT text, vector::distance::knn() AS distance FROM document WHERE embedding <|3,COSINE|> $embeds ORDER BY distance"; struct DocumentInput { text: String, embedding: Vec<f32>, } async fn store_docs( input: Vec<&str>, db: &Surreal<Any>, model: &mut TextEmbedding, ) -> Result<(), Error> { let docs = model .embed(input.clone(), None)? .into_iter() .zip(input.into_iter()) .map(|(embedding, text)| DocumentInput { text: text.to_string(), embedding, }) .collect::<Vec<DocumentInput>>(); db.query(INSERT_QUERY).bind(("docs", docs)).await?; Ok(()) } async fn test_embed( input: &str, db: &Surreal<Any>, model: &mut TextEmbedding, ) -> Result<(), Error> { let Some(embeds) = model.embed(vec![input], None)?.into_iter().next() else { return Err(anyhow::anyhow!("Nothing found at index 0")); }; let val = db .query(VECTOR_QUERY) .bind(("embeds", embeds.clone())) .await? .take::<Value>(0)?; println!("{val}\n"); Ok(()) } async fn main() -> Result<(), Error> { // Default model let mut model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::BGESmallENV15))?; let db = connect("memory").await?; db.use_ns("ns").use_db("db").await?; db.query(SCHEMA).await?; let input = vec![ // Cities "Calgary is a city in the Canadian province of Alberta.", "Ljubljana is the capital and largest city of Slovenia.", // Historical / mythological figures "Xenophon of Athens was a Greek military leader, philosopher, and historian.", "King Arthur was a mythical king in the mythology of Great Britain.", // Planets "Venus is the second planet from the Sun.", "Ceres is a dwarf planet in the middle main asteroid belt between the orbits of Mars and Jupiter.", // Languages "Manx is a Gaelic language of the insular Celtic branch of the Celtic language family", "Interlingue, originally Occidental, is an international auxiliary language created in 1922.", // Sea animals "Octopuses have a complex nervous system and are among the most intelligent and behaviourally diverse invertebrates.", "Clams have no central nervous system at all and are near to plants in intelligence.", ]; store_docs(input, &db, &mut model).await?; println!("Edmonton is closest to:"); test_embed("Edmonton", &db, &mut model).await?; println!("Merlin is closest to:"); test_embed("Merlin", &db, &mut model).await?; println!("Earth is closest to:"); test_embed("Earth", &db, &mut model).await?; println!("Irish is closest to:"); test_embed("Irish language", &db, &mut model).await?; println!("Squid are closest to:"); test_embed("Squid", &db, &mut model).await?; Ok(()) }
Output of the example with the default model:
Edmonton is closest to: [ { distance: 0.2596421358215669f, text: 'Calgary is a city in the Canadian province of Alberta.' }, { distance: 0.5010449624435647f, text: 'Ljubljana is the capital and largest city of Slovenia.' }, { distance: 0.5242241576926254f, text: 'Interlingue, originally Occidental, is an international auxiliary language created in 1922.' } ] Merlin is closest to: [ { distance: 0.3653307924860497f, text: 'King Arthur was a mythical king in the mythology of Great Britain.' }, { distance: 0.4515194174120666f, text: 'Manx is a Gaelic language of the insular Celtic branch of the Celtic language family' }, { distance: 0.5317039966149415f, text: 'Calgary is a city in the Canadian province of Alberta.' } ] Earth is closest to: [ { distance: 0.3380429615054925f, text: 'Venus is the second planet from the Sun.' }, { distance: 0.3764237673020161f, text: 'Ceres is a dwarf planet in the middle main asteroid belt between the orbits of Mars and Jupiter.' }, { distance: 0.444087039462282f, text: 'Calgary is a city in the Canadian province of Alberta.' } ] Irish is closest to: [ { distance: 0.27517683002655635f, text: 'Manx is a Gaelic language of the insular Celtic branch of the Celtic language family' }, { distance: 0.34080671701374754f, text: 'Interlingue, originally Occidental, is an international auxiliary language created in 1922.' }, { distance: 0.5113325799682362f, text: 'King Arthur was a mythical king in the mythology of Great Britain.' } ] Squid are closest to: [ { distance: 0.3439891425642231f, text: 'Octopuses have a complex nervous system and are among the most intelligent and behaviourally diverse invertebrates.' }, { distance: 0.4707156750207915f, text: 'Manx is a Gaelic language of the insular Celtic branch of the Celtic language family' }, { distance: 0.517311424260043f, text: 'Clams have no central nervous system at all and are near to plants in intelligence.' } ]