Skip to main content

Comparing Configurations

Use vai eval to A/B test different retrieval configurations and find the best setup for your use case.

Workflow

1. Run Baseline

vai eval --test-set queries.json --db myapp --collection docs \
--model voyage-4-lite --output baseline.json

2. Run Variant

vai eval --test-set queries.json --db myapp --collection docs \
--model voyage-4-large --rerank --output with-rerank.json

3. Compare

vai eval compare baseline.json with-rerank.json

What to Compare

VariableHow to Test
ModelRun with --model voyage-4-lite vs. --model voyage-4-large
RerankingRun with and without --rerank
Rerank model--rerank-model rerank-2.5 vs. --rerank-model rerank-2.5-lite
Top-KDifferent --top-k values
Chunk sizeRe-ingest with different chunk sizes, then evaluate
DimensionsRe-embed with different --dimensions, then evaluate

Reading Comparison Output

The comparison shows each metric side by side with:

  • Current value for each configuration
  • Delta (difference)
  • Direction indicator (improved ↑ / regressed ↓)

CI Integration

# Fail CI if quality regresses
vai eval --test-set queries.json --baseline baseline.json --json \
| jq '.deltas | to_entries[] | select(.value.regressed) | .key'

Further Reading