Local Inference Setup and Usage
This guide walks through the practical local inference flow in vai v1.31.0.
Prerequisites
vaiinstalled- Python
3.9+ - enough disk space for the model cache and Python environment
- MongoDB Atlas only if you want to store and search embeddings
Step 1: Install vai
npm install -g voyageai-cli
Step 2: Run Nano Setup
vai nano setup
This setup flow:
- checks for a compatible Python version
- creates an isolated virtual environment
- installs the local inference dependencies
- downloads and caches
voyage-4-nano - runs a smoke test
Step 3: Check Local Status
vai nano status
This tells you whether Python, dependencies, model cache, and device detection are ready for local inference.
Step 4: Run a Smoke Test
vai nano test
Use this when you want to confirm that the bridge, model, and embedding path all work before running larger workflows.
Step 5: Embed Locally
vai embed "What is vector search?" --local
You can also choose dimensions explicitly:
vai embed "What is vector search?" --local --model voyage-4-nano --dimensions 512
And use local precision controls when you want smaller vectors:
vai embed "What is vector search?" --local --precision int8
Step 6: Run the Pipeline with Local Embedding
vai pipeline ./docs/ --local --db myapp --collection knowledge --create-index
This keeps chunking and MongoDB storage the same, but routes the embedding step through the local nano model.
Useful Nano Commands
# Show environment and cache details
vai nano info
# Remove cached model files
vai nano clear-cache
Use clear-cache when you want to reclaim disk space or force a fresh download of the local model.
Local Mode and MongoDB
Local inference removes the need for a Voyage API key during embedding, but it does not remove the need for MongoDB Atlas if you want:
- stored embeddings
- vector search
- collection-based retrieval workflows
If you only want to generate embeddings locally, you can do that without MongoDB.
Local Mode and the Voyage 4 Family
Because voyage-4-nano shares embedding space with the rest of the Voyage 4 family, local-first workflows can evolve cleanly:
- embed locally for early experimentation
- keep the same collection structure
- move to API-backed models later when you want hosted scale
Troubleshooting
Python not found
Install Python 3.9+, then run:
vai nano setup
Setup completed but local inference still fails
Run:
vai nano status
vai nano test
These commands usually tell you whether the problem is Python, dependencies, model cache, or runtime setup.
I want a full conceptual explanation
Read Local Inference Overview for the product story, shared embedding space explanation, and Python bridge context.