Local Inference Setup and Usage

This guide walks through the practical local inference flow in vai v1.31.0.

Prerequisites

vai installed
Python 3.9+
enough disk space for the model cache and Python environment
MongoDB Atlas only if you want to store and search embeddings

Step 1: Install vai

npm install -g voyageai-cli

Step 2: Run Nano Setup

vai nano setup

This setup flow:

checks for a compatible Python version
creates an isolated virtual environment
installs the local inference dependencies
downloads and caches voyage-4-nano
runs a smoke test

Step 3: Check Local Status

vai nano status

This tells you whether Python, dependencies, model cache, and device detection are ready for local inference.

Step 4: Run a Smoke Test

vai nano test

Use this when you want to confirm that the bridge, model, and embedding path all work before running larger workflows.

Step 5: Embed Locally

vai embed "What is vector search?" --local

You can also choose dimensions explicitly:

vai embed "What is vector search?" --local --model voyage-4-nano --dimensions 512

And use local precision controls when you want smaller vectors:

vai embed "What is vector search?" --local --precision int8

Step 6: Run the Pipeline with Local Embedding

vai pipeline ./docs/ --local --db myapp --collection knowledge --create-index

This keeps chunking and MongoDB storage the same, but routes the embedding step through the local nano model.

Useful Nano Commands

# Show environment and cache details
vai nano info

# Remove cached model files
vai nano clear-cache

Use clear-cache when you want to reclaim disk space or force a fresh download of the local model.

Local Mode and MongoDB

Local inference removes the need for a Voyage API key during embedding, but it does not remove the need for MongoDB Atlas if you want:

stored embeddings
vector search
collection-based retrieval workflows

If you only want to generate embeddings locally, you can do that without MongoDB.

Local Mode and the Voyage 4 Family

Because voyage-4-nano shares embedding space with the rest of the Voyage 4 family, local-first workflows can evolve cleanly:

embed locally for early experimentation
keep the same collection structure
move to API-backed models later when you want hosted scale

Troubleshooting

Python not found

Install Python 3.9+, then run:

vai nano setup

Setup completed but local inference still fails

Run:

vai nano status
vai nano test

These commands usually tell you whether the problem is Python, dependencies, model cache, or runtime setup.

I want a full conceptual explanation

Read Local Inference Overview for the product story, shared embedding space explanation, and Python bridge context.

Prerequisites​

Step 1: Install vai​

Step 2: Run Nano Setup​

Step 3: Check Local Status​

Step 4: Run a Smoke Test​

Step 5: Embed Locally​

Step 6: Run the Pipeline with Local Embedding​

Useful Nano Commands​

Local Mode and MongoDB​

Local Mode and the Voyage 4 Family​

Troubleshooting​

Python not found​

Setup completed but local inference still fails​

I want a full conceptual explanation​

Related Pages​