Skip to main content

Make Your Engineering Docs Actually Searchable

Model: voyage-code-3 · Internal docs, API references, and runbooks: semantic search in minutes.

Problem

Engineering documentation is scattered across Confluence, Notion, GitHub wikis, README files, and Slack threads. When a new engineer needs to set up their local environment or understand the deployment process, they spend hours piecing together information from a dozen sources.

Keyword search makes this worse. One team writes "authentication," another writes "auth," and a third writes "login flow." Searching for any single term misses the others. The information exists — it's just unfindable.

Solution

vai pipeline takes a directory of documents, chunks them intelligently, generates embeddings with Voyage AI's voyage-code-3 model (optimized for code and technical content), and indexes everything in MongoDB Atlas Vector Search. The result is a semantic search layer over your engineering knowledge that understands what you mean, not just what you typed.

Sample Documents

We provide 16 sample engineering documents (~40KB total) that represent a realistic internal documentation set:

DocumentDescription
architecture-overviewSystem architecture and component diagram
api-authenticationAuth flows and token management
api-endpoints-usersUser API reference
api-endpoints-ordersOrders API reference
local-dev-setupLocal development environment setup
deployment-guideDeployment procedures and checklists
database-schemaSchema design and migration notes
monitoring-runbookMonitoring, alerting, and escalation
incident-responseIncident response procedures
onboarding-checklistNew engineer onboarding
testing-strategyTesting approach and tooling
feature-flagsFeature flag system and usage
error-handlingError handling patterns and conventions
caching-strategyCaching layers and invalidation
adr-001-event-sourcingArchitecture Decision Record: event sourcing
adr-002-graphqlArchitecture Decision Record: GraphQL adoption

Download sample documents

Walkthrough

1. Install vai

npm install -g voyageai-cli

2. Configure credentials

You need a Voyage AI API key and a MongoDB Atlas connection string.

vai configure

3. Download and extract sample docs

Download the sample documents and extract them to a sample-docs/ directory.

4. Run the pipeline

vai pipeline ./sample-docs/ \
--model voyage-code-3 \
--db devdocs_demo \
--collection engineering_knowledge \
--create-index

This processes all 16 documents into 127 chunks, generates embeddings, stores them in MongoDB Atlas, and creates a vector search index.

5. Search your docs

vai search "How do I set up my local dev environment?" \
--db devdocs_demo \
--collection engineering_knowledge

6. Explore in the playground

vai playground --db devdocs_demo --collection engineering_knowledge

Example Queries

"How do I get the development environment running on my laptop?"

SourceScore
local-dev-setup94%
onboarding-checklist87%
deployment-guide72%

The search correctly identifies the local dev setup guide as the primary match, surfaces the onboarding checklist (which references environment setup), and includes the deployment guide for additional context.

"What happens when an API request fails?"

SourceScore
error-handling93%
api-authentication82%
monitoring-runbook76%

Even though the query doesn't mention "error handling" by name, the semantic search understands the intent and surfaces the right documents.

Model Comparison

ModelRelevance ScoreNotes
voyage-code-394%Recommended — trained on code and technical content
voyage-4-large89%Strong general-purpose alternative
voyage-4-lite82%Lower cost, lower accuracy

voyage-code-3 is the recommended model for engineering documentation because it understands code snippets, technical terminology, and the relationship between concepts like "deployment" and "CI/CD pipeline."

Scaling to Production

Source diversity

In practice, your docs live in multiple systems. You can run vai pipeline against different directories and merge everything into a single collection, or use separate collections with metadata filtering.

Keeping docs current

Re-run the pipeline when documents change. For frequently updated docs, consider automating the pipeline as part of your CI/CD process or a cron job.

Cost at scale

voyage-code-3 pricing scales with token volume. For most engineering doc sets (hundreds of documents), embedding costs are negligible — typically under $1 for the initial run.

MCP server integration

Use vai mcp-server to expose your indexed documents as a tool for AI coding assistants, enabling them to search your internal docs directly.

Conversational interface

Once your documents are indexed, use vai chat to have a conversation with your engineering knowledge base:

vai chat --db devdocs_demo --collection engineering_knowledge

Next Steps

  • vai playground — Interactive web UI for exploring your indexed documents
  • vai chat — Conversational interface over your knowledge base
  • Healthcare & Clinical — Clinical knowledge base with voyage-4-large
  • Legal & Compliance — Contract search with voyage-law-2
  • Financial Services — Financial document search with voyage-finance-2