Skip to main content

Benchmarking Guide

vai includes a comprehensive benchmarking suite that measures embedding performance, reranking quality, cross-model compatibility, quantization impact, and cost across different configurations.

Quick Start

# Embedding latency benchmark
vai benchmark embed

# Reranking benchmark
vai benchmark rerank

# Cross-model similarity (shared embedding space)
vai benchmark asymmetric

# Cost comparison
vai benchmark cost

No setup required — all benchmarks use built-in sample data.

Available Benchmarks

TypeWhat It MeasuresWhen to Use
embedLatency per modelChoosing a model for latency-sensitive apps
costCost per 1M tokensBudget planning
asymmetricCross-model similarityValidating shared embedding space
quantizationQuality impact of int8/binaryDeciding on output types

Interpreting Results

Benchmarks show tables with latency (p50, p95, p99), throughput, and quality metrics. Use --json for machine-readable output that you can feed into dashboards or CI pipelines.

vai benchmark embed --json > benchmark-results.json

Further Reading