Skip to main content

Turn Your Contract Library Into a Searchable Knowledge Base

Model: voyage-law-2 · Semantic search across legal documents, powered by a model trained on legal text.

Problem

Legal professionals spend 20–40% of their time searching for information. The challenge isn't that the information doesn't exist — it's that legal language is full of synonyms that keyword search can't bridge. "Indemnification," "hold harmless," and "defense and indemnity" all mean similar things, but a keyword search for one misses the others.

Multiply this across hundreds of contracts, policies, and compliance documents, and finding the right clause becomes a significant bottleneck.

Solution

vai pipeline with voyage-law-2 — a Voyage AI model specifically trained on legal text — processes your contract library into semantically searchable chunks stored in MongoDB Atlas Vector Search. Because the model understands legal terminology and relationships, it finds relevant clauses even when the wording differs from your query.

Sample Documents

We provide 15 sample legal documents (~39KB total) representing a realistic contract and compliance library:

DocumentDescription
master-services-agreementMSA with standard commercial terms
saas-subscription-agreementSaaS subscription terms
data-processing-addendumDPA for data processing obligations
nda-mutualMutual non-disclosure agreement
nda-unilateralOne-way non-disclosure agreement
employment-agreementStandard employment agreement
independent-contractorIndependent contractor agreement
privacy-policyCompany privacy policy
acceptable-use-policyAcceptable use policy
ip-assignment-agreementIntellectual property assignment
gdpr-compliance-summaryGDPR compliance overview
ccpa-compliance-summaryCCPA compliance overview
soc2-policy-overviewSOC 2 policy summary
limitation-of-liabilityLimitation of liability provisions
force-majeure-clausesForce majeure clause collection

Download sample documents

Walkthrough

1. Install vai

npm install -g voyageai-cli

2. Configure credentials

vai configure

3. Download and extract sample docs

Download the sample documents and extract them to a sample-docs/ directory.

4. Run the pipeline

vai pipeline ./sample-docs/ \
--model voyage-law-2 \
--db legal_demo \
--collection legal_knowledge \
--create-index

This processes all 15 documents into 142 chunks, generates embeddings, stores them in MongoDB Atlas, and creates a vector search index.

5. Search your contracts

vai search "data deletion obligations" \
--db legal_demo \
--collection legal_knowledge

6. Explore in the playground

vai playground --db legal_demo --collection legal_knowledge

Example Queries

"What are our obligations if a customer requests deletion of their data?"

SourceScore
gdpr-compliance-summary95%
ccpa-compliance-summary91%
data-processing-addendum88%

The search understands that "deletion of data" relates to GDPR's right to erasure, CCPA's right to delete, and the data processing addendum's obligations — even though each document uses different terminology.

"Compare the indemnification provisions across our contracts"

SourceScore
independent-contractor93%
master-services-agreement90%
saas-subscription-agreement85%

The model correctly surfaces all contracts containing indemnification clauses, regardless of whether they use "indemnify," "hold harmless," or "defend and indemnify."

Model Comparison

ModelRelevance ScoreNotes
voyage-law-295%Recommended — trained specifically on legal text
voyage-4-large87%Good general-purpose alternative
voyage-4-lite78%Misses nuanced legal terminology

voyage-law-2 significantly outperforms general-purpose models on legal content because it understands the semantic relationships between legal terms, clause structures, and regulatory concepts.

Scaling to Production

Privilege and confidentiality

Legal documents are sensitive. Ensure your MongoDB Atlas deployment meets your organization's security and access control requirements. Use Atlas's role-based access control to restrict who can query the collection.

Contract volume

Large organizations manage thousands of contracts. vai pipeline handles bulk processing efficiently. Consider organizing documents by type or client and using separate collections or metadata filtering to scope searches.

Metadata filtering

Use MongoDB Atlas metadata filters to narrow searches by contract type, counterparty, effective date, or jurisdiction. This is critical for large contract libraries where a broad semantic search may return too many results.

Keeping docs current

Re-run the pipeline when contracts are amended or new agreements are executed. Automate this as part of your contract lifecycle management workflow.

Conversational interface

Use vai chat for natural language interaction with your contract library:

vai chat --db legal_demo --collection legal_knowledge

Next Steps

  • vai playground — Interactive web UI for exploring your indexed documents
  • vai chat — Conversational interface over your knowledge base
  • Developer Documentation — Engineering docs with voyage-code-3
  • Healthcare & Clinical — Clinical knowledge base with voyage-4-large
  • Financial Services — Financial document search with voyage-finance-2