Skip to main content

Semantic Search Across Financial Documents, In Minutes

Model: voyage-finance-2 · Earnings calls, risk reports, and policy docs, searchable with a model trained on financial text.

Problem

Financial analysis requires synthesizing information across earnings call transcripts, 10-K filings, risk reports, and internal policy documents. The challenge is compounded by financial jargon — when management mentions "headwinds," they mean challenges; "color" means additional detail; "constructive" means cautiously optimistic.

Keyword search can't decode this language. Searching for "challenges" won't find paragraphs about "headwinds," and searching for "risk" returns thousands of irrelevant hits across every financial document.

Solution

vai pipeline with voyage-finance-2 — a Voyage AI model specifically trained on financial text — processes your financial documents into semantically searchable chunks stored in MongoDB Atlas Vector Search. The model understands financial terminology, earnings call conventions, and regulatory language.

Sample Documents

We provide 15 sample financial documents (~39KB total) representing a realistic financial document set:

DocumentDescription
q3-2025-earnings-callQ3 2025 earnings call transcript
q4-2025-earnings-callQ4 2025 earnings call transcript
q3-2025-10q-summaryQ3 2025 10-Q filing summary
annual-report-summaryAnnual report summary
risk-committee-reportRisk committee report
credit-policyCredit risk policy
market-risk-frameworkMarket risk framework
interest-rate-analysisInterest rate sensitivity analysis
liquidity-policyLiquidity management policy
compliance-aml-summaryAML compliance summary
vendor-risk-assessmentVendor risk assessment
capital-allocation-memoCapital allocation memo
esg-report-summaryESG report summary
fintech-partnership-memoFintech partnership evaluation
regulatory-change-trackerRegulatory change tracker

Download sample documents

Walkthrough

1. Install vai

npm install -g voyageai-cli

2. Configure credentials

vai configure

3. Download and extract sample docs

Download the sample documents and extract them to a sample-docs/ directory.

4. Run the pipeline

vai pipeline ./sample-docs/ \
--model voyage-finance-2 \
--db finance_demo \
--collection financial_knowledge \
--create-index

This processes all 15 documents into 156 chunks, generates embeddings, stores them in MongoDB Atlas, and creates a vector search index.

5. Search your documents

vai search "margin compression outlook" \
--db finance_demo \
--collection financial_knowledge

6. Explore in the playground

vai playground --db finance_demo --collection financial_knowledge

Example Queries

"What did management say about margin compression?"

SourceScore
q3-2025-earnings-call94%
q4-2025-earnings-call89%
annual-report-summary82%

The model understands that "margin compression" relates to discussions of profitability pressure, cost headwinds, and net interest margin — surfacing the relevant earnings call sections even when management used different phrasing.

"What are our biggest risk exposures right now?"

SourceScore
risk-committee-report93%
market-risk-framework87%
credit-policy81%

The search connects "risk exposures" to specific risk categories discussed across multiple documents, rather than just matching the word "risk."

Model Comparison

ModelRelevance ScoreNotes
voyage-finance-294%Recommended — trained specifically on financial text
voyage-4-large86%Decent general-purpose alternative
voyage-4-lite77%Misses financial jargon and context

voyage-finance-2 outperforms general-purpose models because it understands the semantic relationships in financial language — connecting "headwinds" to "challenges," "constructive" to "cautiously optimistic," and "color" to "additional detail."

Scaling to Production

Data sensitivity (MNPI)

Financial documents often contain material non-public information (MNPI). Ensure your MongoDB Atlas deployment has appropriate access controls, encryption at rest, and audit logging. Restrict collection access to authorized personnel only.

Scale projections

A mid-size financial institution might index thousands of documents — quarterly filings, board reports, risk assessments, and compliance memos. vai pipeline handles this volume efficiently. Expect roughly 10 chunks per page of content.

Metadata filtering

Use MongoDB Atlas metadata filters to scope searches by document type, reporting period, business unit, or classification level. This is essential when analysts need to search within a specific quarter or document category.

Real-time ingestion

For time-sensitive documents like earnings call transcripts, integrate vai pipeline into your document ingestion workflow so new content is searchable within minutes of publication.

Conversational interface

Use vai chat for natural language interaction with your financial knowledge base:

vai chat --db finance_demo --collection financial_knowledge

Next Steps

  • vai playground — Interactive web UI for exploring your indexed documents
  • vai chat — Conversational interface over your knowledge base
  • Developer Documentation — Engineering docs with voyage-code-3
  • Healthcare & Clinical — Clinical knowledge base with voyage-4-large
  • Legal & Compliance — Contract search with voyage-law-2