Evaluation Guide
vai's evaluation tools help you measure retrieval quality, compare configurations, and track improvements over time.
Quick Start
# Run evaluation against a test set
vai eval --test-set queries.json --db myapp --collection docs
# Save results for comparison
vai eval --test-set queries.json --rerank --output results-v1.json
# Compare configurations
vai eval compare results-v1.json results-v2.json
Metrics
| Metric | What It Measures | Range |
|---|---|---|
| MRR | How high is the first relevant result? | 0–1 |
| nDCG@K | How well are relevant results ranked? | 0–1 |
| Recall@K | What fraction of relevant docs are found? | 0–1 |
| MAP | Overall precision across recall levels | 0–1 |
| Precision@K | Fraction of top-K that are relevant | 0–1 |
Higher is better for all metrics.
Workflow
- Create a test set with queries and known relevant documents
- Run baseline evaluation and save results
- Change configuration (model, chunk size, reranking, etc.)
- Re-evaluate and compare against baseline
- Iterate until quality meets your requirements
Next Steps
- Test Sets — Creating evaluation test sets
- Comparing Configs — Side-by-side comparison