Benchmarking Guide
vai includes a comprehensive benchmarking suite that measures embedding performance, reranking quality, cross-model compatibility, quantization impact, and cost across different configurations.
Quick Start
# Embedding latency benchmark
vai benchmark embed
# Reranking benchmark
vai benchmark rerank
# Cross-model similarity (shared embedding space)
vai benchmark asymmetric
# Cost comparison
vai benchmark cost
No setup required — all benchmarks use built-in sample data.
Available Benchmarks
| Type | What It Measures | When to Use |
|---|---|---|
embed | Latency per model | Choosing a model for latency-sensitive apps |
cost | Cost per 1M tokens | Budget planning |
asymmetric | Cross-model similarity | Validating shared embedding space |
quantization | Quality impact of int8/binary | Deciding on output types |
Interpreting Results
Benchmarks show tables with latency (p50, p95, p99), throughput, and quality metrics. Use --json for machine-readable output that you can feed into dashboards or CI pipelines.
vai benchmark embed --json > benchmark-results.json
Further Reading
- Embedding Benchmark — Latency details
- Cost Benchmark — Cost comparison
- Asymmetric Benchmark — Cross-model tests
- Quantization Benchmark — Output type impact