vai similarity
Compute cosine similarity between texts by embedding them and comparing their vectors.
- CLI
- Playground
Synopsis
vai similarity [textA] [textB] [options]
vai similarity [textA] --against <text1> <text2> ... [options]
Description
vai similarity embeds two or more texts and computes cosine similarity between them. It supports two modes:
- Two-text comparison: Compare exactly two texts and get a single similarity score.
- One-vs-many: Compare one text against multiple texts using
--against, with results sorted by similarity (descending).
All texts are embedded in a single API call for efficiency.
Options
| Flag | Description | Default |
|---|---|---|
--against <texts...> | Compare first text against multiple texts | — |
--file1 <path> | Read text A from a file | — |
--file2 <path> | Read text B from a file | — |
-m, --model <model> | Embedding model | voyage-4-large |
--dimensions <n> | Output dimensions | Model default |
--json | Machine-readable JSON output | — |
-q, --quiet | Suppress non-essential output (score only) | — |
Examples
Compare two texts
vai similarity "king" "queen"
Compare one text against many
vai similarity "database" --against "MongoDB is a NoSQL database" "Python is a programming language" "Vector search finds similar documents"
Compare files
vai similarity --file1 document-a.txt --file2 document-b.txt
Get just the score
vai similarity "cat" "dog" --quiet
# Output: 0.847293
JSON output for scripting
vai similarity "hello" "world" --json
Using the Similarity Tab
The Similarity tab in vai playground lets you visually compare texts and see how closely related they are in embedding space.
Getting Started
- Run
vai playgroundto start the web app - Select the Similarity tab from the navigation
- Enter two texts in the input fields
- Click Compare to compute the cosine similarity score
Features
Side-by-side input: Enter both texts in adjacent fields for easy comparison.
Similarity score: Displays the cosine similarity as a value between 0.0 and 1.0, with a visual indicator of how similar the texts are. Higher scores mean more semantically similar.
Model selection: Choose which embedding model to use for the comparison.
Dimensions: Adjust output dimensions to see how dimensionality affects similarity scores.
Use Cases
- Testing how well a search query matches candidate documents
- Checking if two pieces of content are semantically similar (deduplication)
- Exploring how the embedding model understands different phrasings
Output
In two-text mode, outputs a single cosine similarity score (0.0 to 1.0). In one-vs-many mode, results are sorted by similarity descending, showing each comparison text and its score.
Tips
- Cosine similarity ranges from -1 to 1, but for normalized embeddings it's typically 0 to 1. Higher means more similar.
- Use
--quiet(CLI) to get just the numeric score for scripting. - No
--input-typeis set since you're comparing texts directly, not doing asymmetric retrieval.
Related Commands
vai embed— Generate raw embeddingsvai search— Similarity search against a MongoDB collection