Skip to main content

vai ingest

Bulk import documents from files into MongoDB Atlas with batched embedding, progress tracking, and error handling.

Synopsis

vai ingest --file <path> --db <database> --collection <name> --field <name> [options]

Description

vai ingest reads documents from a file (JSON, JSONL, CSV, or plain text), embeds them in batches via the Voyage AI API, and inserts them into MongoDB Atlas. It auto-detects the file format and provides a real-time progress bar during ingestion.

Supported formats:

  • JSONL (.jsonl, .ndjson): One JSON object per line with a text field
  • JSON (.json): Array of objects with a text field
  • CSV (.csv): Requires --text-column to specify which column to embed
  • Plain text: One document per non-empty line

Options

FlagDescriptionDefault
--file <path>Input file (required)
--db <database>Database name (required)
--collection <name>Collection name (required)
--field <name>Embedding field name (required)
-m, --model <model>Embedding modelvoyage-4-large
--input-type <type>Input type: query or documentdocument
-d, --dimensions <n>Output dimensionsModel default
--batch-size <n>Documents per API batch (max 128)50
--text-column <name>CSV column to embed (required for CSV)
--text-field <name>JSON/JSONL field containing texttext
--dry-runParse file and show stats without embedding
--strictAbort on first batch error
--jsonMachine-readable JSON output
-q, --quietSuppress progress, show only summary

Examples

Ingest a JSONL file

vai ingest --file documents.jsonl --db myapp --collection knowledge --field embedding

Ingest CSV with a specific text column

vai ingest --file products.csv --db store --collection items --field embedding --text-column description

Dry run to check file parsing

vai ingest --file data.json --db myapp --collection docs --field embedding --dry-run

Strict mode with smaller batches

vai ingest --file corpus.jsonl --db myapp --collection docs --field embedding \
--batch-size 25 --strict

JSON output for CI pipelines

vai ingest --file data.jsonl --db myapp --collection docs --field embedding --json

Output

On completion, shows a summary with:

  • Documents succeeded/failed
  • Total batches processed
  • Token count and model used
  • Duration and throughput (docs/sec)

Tips

  • The Voyage AI API limits batches to 128 texts. The default batch size of 50 balances throughput and reliability.
  • Use --dry-run first to validate your file format and see estimated token counts before spending API credits.
  • Failed batches are skipped by default; use --strict to abort on the first error.
  • For simple single-document storage, use vai store instead.