SurrealDB | Make a medical chatbot using GraphRAG with SurrealDB + LangChain

Using LangChain

This post is a follow-up to this one from two weeks ago which detailed how to make a medical chatbot using SurrealDB and LangChain using Python.

Rust developers have an option to do the same too, thanks to a crate called langchain_rust which as of last year includes support for SurrealDB as a vector store. This implementation doesn’t (yet!) include graph queries, but we can still use classic vector search to find recommendations for treatment for a patient.

To start off, use a command like cargo new medical_bot to create a new Cargo project, go into the project directory and add the following under [dependencies].

anyhow = "1.0.98"
langchain-rust = { version = "4.6.0", features = ["surrealdb", "mistralai"] }
serde = "1.0.219"
serde_json = "1.0.140"
serde_yaml = "0.9.34"
surrealdb = { version = "2.0.2", features = ["kv-mem"] }
tokio = "1.45.0"

The langchain-rust crate comes with OpenAI as a default, and includes a large number of features. We will add mistralai to show how easy it is to switch from one platform to another with only about two lines of different code.

The original post assumes that we have a big YAML document with a number of symptoms along with their possible treatments, which is what the serde_yaml dependency will let us work with.

- category: General Symptoms
  symptoms:
    - name: Fever
      description: Elevated body temperature, usually above 100.4°F (38°C).
      medical_practice: General Practice, Internal Medicine, Pediatrics
      possible_treatments:
        - Antipyretics (e.g., ibuprofen, acetaminophen)
        - Rest
        - Hydration
        - Treating the underlying cause

To keep the logic simple, we will take only the description of each symptom and its possible_treatments, giving us two structs that look like this.

#[derive(Debug, Deserialize)]
pub struct SymptomCategory {
    pub symptoms: Vec<Symptom>
}

#[derive(Debug, Deserialize)]
pub struct Symptom {
    pub description: String,
    pub possible_treatments: Vec<String>,
}

Then for each symptom, we will look through the possible treatments to create a document for each with text that looks like the following:

‘Elevated body temperature, usually above 100.4°F (38°C).’ can be treated by ‘Antipyretics (e.g., ibuprofen, acetaminophen)’
‘Elevated body temperature, usually above 100.4°F (38°C).’ can be treated by ‘Rest’
‘Elevated body temperature, usually above 100.4°F (38°C).’ can be treated by ‘Hydration’
‘Elevated body temperature, usually above 100.4°F (38°C).’ can be treated by ‘Treating the underlying cause’

This needs to be turned into a Document struct on the langchain-rust side, which looks like this.

pub struct Document {
    pub page_content: String,
    pub metadata: HashMap<String, Value>,
    pub score: f64,
}

The way to create a Document is via Document::new() which takes a String for the page_content, followed by an optional HashMap for any metadata. The score will be 0.0 when inserting and is only used later on when a similarity search is performed to return a Document.

For the metadata, we will add the other possible treatments so that any user will be able to first see a recommended treatment for a symptom, followed by all possible treatments for reference.

The Value part of the Document struct is a serde_json Value, which is why we have serde_json inside our Cargo.toml as well.

All in all, the logic to grab the YAML file and turn it ito a Vec of Documents looks like this.

fn get_docs() -> Result<Vec<Document>, Error> {
    let yaml_str = std::fs::read_to_string("symptoms.yaml")?;
    let categories: Vec<SymptomCategory> = serde_yaml::from_str(&yaml_str)?;

    let symptoms = categories
        .into_iter()
        .flat_map(|cat| cat.symptoms)
        .collect::<Vec<Symptom>>();
    Ok(symptoms
        .into_iter()
        .flat_map(|symptom| {
            let metadata = HashMap::from([
                (
                    "possible treatments".to_string(),
                    Value::from(symptom.possible_treatments.clone()),
                )
            ]);
            symptom
                .possible_treatments
                .into_iter()
                .map(|treat| {
                    Document::new(format!("'{}' can be treated by '{treat}'.", symptom.description.clone()))
                        .with_metadata(metadata.clone())
                })
                .collect::<Vec<Document>>()
        })
        .collect::<Vec<Document>>())
}

With this taken care of, it’s time to do some setup inside main(). First we need to start running the database, which can be run in memory or via some other path such as an address to a Surreal Cloud or a locally running instance.

let database_url = std::env::var("DATABASE_URL").unwrap_or("memory".to_string());

let db = surrealdb::engine::any::connect(database_url).await?;
db.query("DEFINE NAMESPACE test; USE NAMESPACE test; DEFINE DATABASE test;")
    .await?;

//  Uncomment the following lines to authenticate if necessary
//  .user(surrealdb::opt::auth::Root {
//      username: "root".into(),
//      password: "secret".into(),
//  });

db.use_ns("test").await?;
db.use_db("test").await?;

The next step is to initialize an embedder from the langchain-rust crate. Here we have the choice of an OpenAiEmbedder or MistralAIEmbedder thanks to the added feature flag.

After that comes a StoreBuilder struct used to initiate a SurrealDB Store, which takes an embedder, a database, and a number of dimensions - 1536 in this case for OpenAI. If using Mistral, the dimensions would be 1024.

Note that we are wrapping this in an Arc so that the store can start adding the documents on startup inside a separate task without making the user wait to see any CLI output.

At the very end is an .initialize() method which defines some tables and fields which will be used when doing similarity searches.

// Initialize Embedder
let embedder = OpenAiEmbedder::default();
// Embedding size is 1024 in this case
// let embedder = MistralAIEmbedder::try_new()?;

// Initialize the SurrealDB Vector Store
let store = Arc::new(
    StoreBuilder::new()
        .embedder(embedder)
        .db(db)
        .vector_dimensions(1536)
        .build()
        .await
        .map_err(|e| anyhow!(e.to_string()))?,
);

store
    .initialize()
    .await
    .map_err(|e| anyhow!(e.to_string()))?;

Then we will clone the Arc to allow the store to be passed into a new blocking task to add the Vec<Document> returned by our get_docs() function. Inside this is a method called add_documents() which is a built-in method from the rust-langchain crate.

let arced = Arc::clone(&store);

tokio::task::spawn_blocking(move || {
    let docs = get_docs()?;
    Handle::current()
        .block_on(arced.add_documents(&docs, &VecStoreOptions::default()))
        .map_err(|e| anyhow::anyhow!("{e}"))?;
    Ok::<(), Error>(())
});

While the store adds these documents in its own task, we will start a simple CLI that asks the user for a query, and then passes this into the built-in .similarity_search() method. This method allows us to specify the number of documents to return and a minimum similarity score, to which we will go with 2 and 0.6.

The rest of the code just involves setting up a simple loop to handle user output, along with the results of the output of the .similarity_search() method.

loop {
        // Ask for user input
        print!("Query> ");
        stdout().flush()?;
        let mut query = String::new();
        stdin().read_line(&mut query)?;

        let results = store
            .similarity_search(
                &query,
                2,
                &VecStoreOptions::default().with_score_threshold(0.6),
            )
            .await
            .map_err(|e| anyhow!(e.to_string()))?;

        if results.is_empty() {
            println!("No results found.");
        } else {
            println!("Possible symptoms:");
            results.iter().for_each(|r| {
                println!("{}\n All possible treatments: ", r.page_content);
                if let Some(Value::Array(array)) = r.metadata.get("possible treatments") {
                    for val in array {
                        if let Value::String(s) = val {
                            println!("  {s}");
                        }
                    }
                };
                println!();
            });
        };
    }

As the output shows, our bot is capable of returning meaningful results despite only having access to data from 236 lines of YAML!

Query> I've been exercising a lot outside and it's really hot.
Possible symptoms:
'Elevated body temperature, usually above 100.4°F (38°C).' can be treated by 'Hydration'.
 All possible treatments: 
  Antipyretics (e.g., ibuprofen, acetaminophen)
  Rest
  Hydration
  Treating the underlying cause

'Elevated body temperature, usually above 100.4°F (38°C).' can be treated by 'Rest'.
 All possible treatments: 
  Antipyretics (e.g., ibuprofen, acetaminophen)
  Rest
  Hydration
  Treating the underlying cause

Query> I get dizzy sometimes
Possible symptoms:
'Feeling lightheaded, unsteady, or experiencing a sensation that the room is spinning.' can be treated by 'Medications to reduce nausea or dizziness'.
 All possible treatments: 
  Addressing underlying cause (e.g., inner ear issues, low blood pressure)
  Vestibular rehabilitation
  Medications to reduce nausea or dizziness
  Hydration

'Feeling lightheaded, unsteady, or experiencing a sensation that the room is spinning.' can be treated by 'Addressing underlying cause (e.g., inner ear issues, low blood pressure)'.
 All possible treatments: 
  Addressing underlying cause (e.g., inner ear issues, low blood pressure)
  Vestibular rehabilitation
  Medications to reduce nausea or dizziness
  Hydration

Query> What should I do in life?
Possible symptoms:
'Feeling unusually drained, lacking energy, or experiencing persistent exhaustion.' can be treated by 'Lifestyle modifications (diet, exercise)'.
 All possible treatments: 
  Rest and adequate sleep
  Lifestyle modifications (diet, exercise)
  Addressing underlying medical conditions (e.g., anemia, thyroid disorders)
  Stress management

'Feeling unusually drained, lacking energy, or experiencing persistent exhaustion.' can be treated by 'Stress management'.
 All possible treatments: 
  Rest and adequate sleep
  Lifestyle modifications (diet, exercise)
  Addressing underlying medical conditions (e.g., anemia, thyroid disorders)
  Stress management

Want to give it a try yourself? Save the content at this link to the filename symptoms.yaml and then copy the following code into your cargo project, then set the env var OPENAI_API_KEY or MISTRAL_API_KEY along with cargo run.

You can also give a crate called archiver a try, which has its own command-line interface to use SurrealDB with Ollama via the same crate we used in this post.

// To run this example execute: `cargo run` in the folder. Be sure to have an OpenAPI key
// set to the OPENAI_API_KEY env var
// or MISTRAL_API_KEY if using Mistral

use anyhow::{Error, anyhow};
use langchain_rust::{
    embedding::{MistralAIEmbedder, openai::openai_embedder::OpenAiEmbedder},
    schemas::Document,
    vectorstore::{VecStoreOptions, VectorStore, surrealdb::StoreBuilder},
};
use serde::Deserialize;
use serde_json::Value;
use std::{
    collections::HashMap,
    io::{Write, stdin, stdout},
    sync::Arc,
};
use tokio::runtime::Handle;

#[derive(Debug, Deserialize)]
pub struct SymptomCategory {
    pub category: String,
    pub symptoms: Vec<Symptom>,
}

#[derive(Debug, Deserialize)]
pub struct Symptom {
    pub name: String,
    pub description: String,
    pub possible_treatments: Vec<String>,
}

fn get_docs() -> Result<Vec<Document>, Error> {
    let yaml_str = std::fs::read_to_string("symptoms.yaml")?;
    let categories: Vec<SymptomCategory> = serde_yaml::from_str(&yaml_str)?;

    let symptoms = categories
        .into_iter()
        .flat_map(|cat| cat.symptoms)
        .collect::<Vec<Symptom>>();
    Ok(symptoms
        .into_iter()
        .flat_map(|symptom| {
            let metadata = HashMap::from([(
                "possible treatments".to_string(),
                Value::from(symptom.possible_treatments.clone()),
            )]);
            symptom
                .possible_treatments
                .into_iter()
                .map(|treat| {
                    Document::new(format!(
                        "'{}' can be treated by '{treat}'.",
                        symptom.description.clone()
                    ))
                    .with_metadata(metadata.clone())
                })
                .collect::<Vec<Document>>()
        })
        .collect::<Vec<Document>>())
}

#[tokio::main]
async fn main() -> Result<(), Error> {
    let database_url = std::env::var("DATABASE_URL").unwrap_or("memory".to_string());

    let db = surrealdb::engine::any::connect(database_url).await?;
    db.query("DEFINE NAMESPACE test; USE NAMESPACE test; DEFINE DATABASE test;")
        .await?;

    //  Uncomment the following lines to authenticate if necessary
    //  .user(surrealdb::opt::auth::Root {
    //      username: "root".into(),
    //      password: "secret".into(),
    //  });

    db.use_ns("test").await?;
    db.use_db("test").await?;

    // Initialize Embedder
    let embedder = OpenAiEmbedder::default();
    // Embedding size is 1024 in this case
    // let embedder = MistralAIEmbedder::try_new()?;

    // Initialize the SurrealDB Vector Store
    let store = Arc::new(
        StoreBuilder::new()
            .embedder(embedder)
            .db(db)
            .vector_dimensions(1536)
            .build()
            .await
            .map_err(|e| anyhow!(e.to_string()))?,
    );

    // Intialize the tables in the database. This is required to be done only once.
    store
        .initialize()
        .await
        .map_err(|e| anyhow!(e.to_string()))?;

    let arced = Arc::clone(&store);

    tokio::task::spawn_blocking(move || {
        let docs = get_docs()?;
        Handle::current()
            .block_on(arced.add_documents(&docs, &VecStoreOptions::default()))
            .map_err(|e| anyhow::anyhow!("{e}"))?;
        Ok::<(), Error>(())
    });

    loop {
        // Ask for user input
        print!("Query> ");
        stdout().flush()?;
        let mut query = String::new();
        stdin().read_line(&mut query)?;

        let results = store
            .similarity_search(
                &query,
                2,
                &VecStoreOptions::default().with_score_threshold(0.6),
            )
            .await
            .map_err(|e| anyhow!(e.to_string()))?;

        if results.is_empty() {
            println!("No results found.");
        } else {
            println!("Possible symptoms:");
            results.iter().for_each(|r| {
                println!("{}\n All possible treatments: ", r.page_content);
                if let Some(Value::Array(array)) = r.metadata.get("possible treatments") {
                    for val in array {
                        if let Value::String(s) = val {
                            println!("  {s}");
                        }
                    }
                };
                println!();
            });
        };
    }
    Ok(())
}

Make a medical chatbot using GraphRAG with SurrealDB + LangChain

Using LangChain

The power of SurrealDB embedded

Ten more schema tips

Get insider access to Surreal's latest news and events