vai ingest
Bulk import documents from files into MongoDB Atlas with batched embedding, progress tracking, and error handling.
Synopsis
vai ingest --file <path> --db <database> --collection <name> --field <name> [options]
Description
vai ingest reads documents from a file (JSON, JSONL, CSV, or plain text), embeds them in batches via the Voyage AI API, and inserts them into MongoDB Atlas. It auto-detects the file format and provides a real-time progress bar during ingestion.
Supported formats:
- JSONL (
.jsonl,.ndjson): One JSON object per line with a text field - JSON (
.json): Array of objects with a text field - CSV (
.csv): Requires--text-columnto specify which column to embed - Plain text: One document per non-empty line
Options
| Flag | Description | Default |
|---|---|---|
--file <path> | Input file (required) | — |
--db <database> | Database name (required) | — |
--collection <name> | Collection name (required) | — |
--field <name> | Embedding field name (required) | — |
-m, --model <model> | Embedding model | voyage-4-large |
--input-type <type> | Input type: query or document | document |
-d, --dimensions <n> | Output dimensions | Model default |
--batch-size <n> | Documents per API batch (max 128) | 50 |
--text-column <name> | CSV column to embed (required for CSV) | — |
--text-field <name> | JSON/JSONL field containing text | text |
--dry-run | Parse file and show stats without embedding | — |
--strict | Abort on first batch error | — |
--json | Machine-readable JSON output | — |
-q, --quiet | Suppress progress, show only summary | — |
Examples
Ingest a JSONL file
vai ingest --file documents.jsonl --db myapp --collection knowledge --field embedding
Ingest CSV with a specific text column
vai ingest --file products.csv --db store --collection items --field embedding --text-column description
Dry run to check file parsing
vai ingest --file data.json --db myapp --collection docs --field embedding --dry-run
Strict mode with smaller batches
vai ingest --file corpus.jsonl --db myapp --collection docs --field embedding \
--batch-size 25 --strict
JSON output for CI pipelines
vai ingest --file data.jsonl --db myapp --collection docs --field embedding --json
Output
On completion, shows a summary with:
- Documents succeeded/failed
- Total batches processed
- Token count and model used
- Duration and throughput (docs/sec)
Tips
- The Voyage AI API limits batches to 128 texts. The default batch size of 50 balances throughput and reliability.
- Use
--dry-runfirst to validate your file format and see estimated token counts before spending API credits. - Failed batches are skipped by default; use
--strictto abort on the first error. - For simple single-document storage, use
vai storeinstead.
Related Commands
vai store— Store a single documentvai pipeline— End-to-end chunk → embed → storevai purge— Remove stored embeddingsvai refresh— Re-embed with a new model