Local Inference Overview

vai v1.31.0 adds local inference with voyage-4-nano, giving you a zero-API-key path into Voyage AI embeddings.

Instead of sending every embedding request to the Voyage AI API, vai can now run voyage-4-nano on your machine through a lightweight Python bridge. The CLI stays the same. You install the model once with vai nano setup, then use --local on the commands that support local embedding.

Local-first

Local inference is the robot's best fit for thoughtful first-run guidance

This part of the docs now carries the same “thinking” energy as the CLI because it is the place where users pause to understand what is happening: Python under the hood, shared embedding space across Voyage 4, and a clean path from local experiments to hosted scale.

Open setup and usage

Why This Matters

Before nano, getting started with vai usually meant:

install the CLI
configure a Voyage API key
configure MongoDB Atlas
start embedding

With nano, the first step can be much simpler:

npm install -g voyageai-cli
vai nano setup
vai embed "What is vector search?" --local

That makes local inference the easiest way to try the product, test a workflow, or teach the Voyage 4 model family without requiring a hosted API credential on day one.

What the Python Bridge Does

vai is a Node.js CLI. voyage-4-nano runs in Python. The Python bridge connects those two pieces without changing the overall CLI experience.

The bridge:

creates and manages an isolated Python environment
installs the local inference dependencies
downloads and caches the voyage-4-nano model
receives embedding requests from vai
returns embeddings back to the CLI

From the user's point of view, the important detail is simple: local inference is built into the normal vai workflow, but it relies on Python under the hood.

Shared Embedding Space

voyage-4-nano shares embedding space with the rest of the Voyage 4 family:

voyage-4-large
voyage-4
voyage-4-lite

That means local-first does not trap you in a separate workflow. You can start with nano, then move to API-backed Voyage 4 models later as your workload grows.

Common patterns:

prototype locally with voyage-4-nano
move to voyage-4-lite for lower-cost hosted scale
move to voyage-4-large for best retrieval quality

What Local Inference Changes

Local inference changes the embedding step. It does not replace the rest of the system.

Local mode gives you:

local embeddings with no Voyage API key
a strong onboarding path for development and experimentation
compatibility with the broader Voyage 4 family story

You may still want hosted services for:

Voyage AI reranking
API-backed production embedding throughput
MongoDB Atlas storage and vector search
chat workflows that depend on remote providers

Core Commands

# One-time setup
vai nano setup

# Check local readiness
vai nano status

# Smoke test local inference
vai nano test

# Embed locally
vai embed "What is vector search?" --local

# Run the ingestion pipeline with local embedding
vai pipeline ./docs/ --local --db myapp --collection knowledge --create-index

When to Use Nano vs API Models

Use voyage-4-nano when:

you want the fastest path to a working embedding
you want to avoid API setup at the start
you are prototyping, teaching, or iterating locally
you want to understand the Voyage 4 family before moving to hosted scale

Use API-backed Voyage 4 models when:

you want hosted production throughput
you want the strongest retrieval quality with voyage-4-large
you need a fully hosted retrieval stack
your workflow depends on API-only capabilities

Local inference is the robot's best fit for thoughtful first-run guidance

Why This Matters​

What the Python Bridge Does​

Shared Embedding Space​

What Local Inference Changes​

Core Commands​

When to Use Nano vs API Models​

Next Steps​