Benchmarking Guide

vai includes a comprehensive benchmarking suite that measures embedding performance, reranking quality, cross-model compatibility, quantization impact, and cost across different configurations.

Quick Start

# Embedding latency benchmark
vai benchmark embed

# Reranking benchmark
vai benchmark rerank

# Cross-model similarity (shared embedding space)
vai benchmark asymmetric

# Cost comparison
vai benchmark cost

No setup required — all benchmarks use built-in sample data.

Available Benchmarks

Type	What It Measures	When to Use
`embed`	Latency per model	Choosing a model for latency-sensitive apps
`cost`	Cost per 1M tokens	Budget planning
`asymmetric`	Cross-model similarity	Validating shared embedding space
`quantization`	Quality impact of int8/binary	Deciding on output types

Interpreting Results

Benchmarks show tables with latency (p50, p95, p99), throughput, and quality metrics. Use --json for machine-readable output that you can feed into dashboards or CI pipelines.

vai benchmark embed --json > benchmark-results.json

Quick Start​

Available Benchmarks​

Interpreting Results​

Further Reading​

Quick Start

Available Benchmarks

Interpreting Results

Further Reading