Skip to main content

Build a Clinical Knowledge Base in 20 Minutes

Model: voyage-4-large · From clinical guidelines to searchable AI, using your own infrastructure.

Problem

Clinical documentation is overwhelming. Guidelines update quarterly, drug interaction databases span thousands of pages, and protocols vary by department. When a clinician searches for "diabetes kidney treatment," they need to find documents about "glycemic management in chronic kidney disease" — but keyword search won't make that connection.

The stakes are high. Missed information in a clinical context isn't just inconvenient — it affects patient outcomes.

Solution

vai pipeline with voyage-4-large (Voyage AI's highest-accuracy general-purpose model) processes clinical documents into semantically searchable chunks stored in MongoDB Atlas Vector Search. There is no healthcare-specific embedding model, so voyage-4-large is the recommended choice for its superior accuracy on complex, domain-specific text.

Sample Documents

We provide 15 sample clinical documents (~34KB total) representing a realistic clinical knowledge base:

DocumentDescription
diabetes-managementDiabetes management guidelines
diabetes-renalGlycemic management in chronic kidney disease
metformin-referenceMetformin prescribing reference
sglt2-inhibitorsSGLT2 inhibitor class overview
hypertension-guidelinesHypertension treatment guidelines
ace-inhibitor-referenceACE inhibitor prescribing reference
heart-failure-protocolHeart failure management protocol
anticoagulation-guideAnticoagulation therapy guide
sepsis-bundleSepsis recognition and treatment bundle
pain-managementPain management protocols
drug-interactions-cardiacCardiac drug interactions
ckd-stagingChronic kidney disease staging criteria
insulin-protocolsInsulin dosing protocols
discharge-checklistPatient discharge checklist
falls-preventionFalls prevention protocol

Download sample documents

Walkthrough

1. Install vai

npm install -g voyageai-cli

2. Configure credentials

vai configure

3. Download and extract sample docs

Download the sample documents and extract them to a sample-docs/ directory.

4. Run the pipeline

vai pipeline ./sample-docs/ \
--model voyage-4-large \
--db healthcare_demo \
--collection clinical_knowledge \
--create-index

This processes all 15 documents into 118 chunks, generates embeddings, stores them in MongoDB Atlas, and creates a vector search index.

5. Search your knowledge base

vai search "medications to avoid with kidney problems" \
--db healthcare_demo \
--collection clinical_knowledge

6. Explore in the playground

vai playground --db healthcare_demo --collection clinical_knowledge

Example Queries

"What medications should I avoid in a patient with kidney problems?"

SourceScore
metformin-reference94%
ckd-staging91%
ace-inhibitor-reference87%

The search understands that "kidney problems" relates to renal function, CKD staging, and drug dosing adjustments — surfacing the metformin reference (which requires renal dose adjustment) and the CKD staging criteria.

"How do I manage blood sugar in someone who cannot take metformin?"

SourceScore
diabetes-management93%
diabetes-renal90%
sglt2-inhibitors86%

The query never mentions "SGLT2 inhibitors" or "glycemic management," but semantic search correctly identifies alternative diabetes treatments and renal-specific glycemic guidelines.

Model Comparison

ModelRelevance ScoreNotes
voyage-4-large95%Recommended — highest accuracy general-purpose model
voyage-4-lite84%Lower cost, reduced accuracy on clinical terminology
voyage-code-372%Optimized for code, not clinical text

Since there is no healthcare-specific Voyage AI model, voyage-4-large is the clear choice. Its accuracy on domain-specific terminology significantly outperforms lighter alternatives.

Scaling to Production

HIPAA considerations

MongoDB Atlas offers HIPAA-eligible clusters with a Business Associate Agreement (BAA). When working with protected health information (PHI), deploy your Atlas cluster on a HIPAA-eligible tier and ensure your Voyage AI usage complies with your organization's data handling policies.

Document volume

Clinical knowledge bases grow quickly. A typical hospital system might have thousands of guidelines, protocols, and formulary documents. vai pipeline handles large document sets efficiently — run it in batches or against entire directory trees.

Keeping guidelines current

Clinical guidelines update frequently. Automate re-indexing when source documents change to ensure your knowledge base reflects the latest evidence-based recommendations.

Metadata filtering

Use MongoDB Atlas metadata filters to scope searches by department, document type, or effective date. This is especially useful when guidelines have superseded versions.

Conversational interface

Use vai chat for a conversational interface over your clinical knowledge base:

vai chat --db healthcare_demo --collection clinical_knowledge

Next Steps

  • vai playground — Interactive web UI for exploring your indexed documents
  • vai chat — Conversational interface over your knowledge base
  • Developer Documentation — Engineering docs with voyage-code-3
  • Legal & Compliance — Contract search with voyage-law-2
  • Financial Services — Financial document search with voyage-finance-2