Skip to main content

Workflow Node Reference

Every workflow step uses a tool field that selects one of the node types listed below. Nodes are organized into eight categories: Retrieval, Embedding, Processing, Control Flow, Generation, Integration, Management, and Utility.

Each node section documents what it does, its inputs and outputs, and practical tips for using it in pipelines.

Retrieval

query

Performs a full RAG query: embeds your question with Voyage AI, runs vector search against MongoDB Atlas, and reranks the results for maximum relevance.

How it works: Takes your natural language query, converts it to a vector embedding using a Voyage AI model, then performs an approximate nearest neighbor search against the specified MongoDB Atlas collection. The initial candidates are reranked using a neural reranker to surface the most relevant documents.

Inputs

ParameterTypeRequiredDescription
querystringYesThe natural language question or search text.
collectionstringNoMongoDB collection name. Falls back to project default if omitted.
dbstringNoMongoDB database name. Falls back to project default if omitted.
limitnumberNoMaximum number of results to return (default: 5).
filterobjectNoMongoDB pre-filter applied during vector search to narrow candidates.

Output

FieldTypeDescription
resultsarrayArray of matched documents, each with text, source, relevanceScore, and metadata.
querystringThe original query string.
modelstringThe embedding model used.

Example

{
"id": "find_docs",
"tool": "query",
"name": "Find relevant docs",
"inputs": {
"query": "{{ inputs.question }}",
"collection": "knowledge",
"limit": 5,
"filter": { "metadata.type": "api-doc" }
}
}
tip

Use the filter parameter to narrow results by metadata fields before vector search runs, improving both relevance and speed. Pair with a generate node to build a complete RAG pipeline.

Raw vector similarity search without reranking. Faster than RAG Query but results are ordered by vector distance only.

How it works: Embeds your query text using a Voyage AI model, then performs an approximate nearest neighbor search against the specified MongoDB Atlas vector index. Returns results ordered by cosine similarity score, without applying a neural reranker.

Inputs

ParameterTypeRequiredDescription
querystringYesThe search query text.
collectionstringNoMongoDB collection name.
dbstringNoMongoDB database name.
limitnumberNoMaximum results to return (default: 10).
filterobjectNoMongoDB pre-filter for vector search.

Output

FieldTypeDescription
resultsarrayArray of matched documents with text, source, and similarity score.
querystringThe original query string.

Example

{
"id": "vec_search",
"tool": "search",
"name": "Vector search",
"inputs": {
"query": "{{ inputs.question }}",
"limit": 20
}
}
tip

Use search instead of query when speed matters more than precision, or when you plan to rerank separately with a downstream rerank node.

rerank

Reorders a list of documents by relevance to a query using a Voyage AI neural reranker.

How it works: Takes a query and an array of document texts, then uses a Voyage AI reranking model to score each document against the query. Returns the documents sorted by relevance score, with the most relevant first.

Inputs

ParameterTypeRequiredDescription
querystringYesThe query to rank documents against.
documentsarrayYesArray of document text strings to rerank.
modelstringNoReranking model (default: rerank-2.5). Use rerank-2.5-lite for faster results.

Output

FieldTypeDescription
resultsarrayReranked documents with index and relevance_score fields.
modelstringThe reranking model used.

Example

{
"id": "rerank_results",
"tool": "rerank",
"name": "Rerank search results",
"inputs": {
"query": "{{ inputs.question }}",
"documents": "{{ vec_search.output.results }}"
}
}
tip

Feed the output of a search node into rerank for a two-stage retrieval pipeline. Reranking works best with 10 to 50 candidate documents.

ingest

Chunks text, embeds each chunk with Voyage AI, and stores the vectors in MongoDB Atlas.

How it works: Takes raw text content and a source identifier, splits the text into chunks using the specified strategy, generates vector embeddings for each chunk via the Voyage AI API, and inserts the embedded chunks into the target MongoDB Atlas collection.

Inputs

ParameterTypeRequiredDescription
textstringYesThe text content to ingest.
collectionstringNoTarget MongoDB collection.
dbstringNoTarget MongoDB database.
sourcestringNoSource identifier attached to each chunk for citation tracking.
chunkSizenumberNoTarget chunk size in characters (default: 512).
chunkStrategystringNoChunking strategy: fixed, sentence, paragraph, recursive, or markdown.

Output

FieldTypeDescription
chunksCreatednumberNumber of chunks created and stored.
sourcestringThe source identifier used.

Example

{
"id": "store_doc",
"tool": "ingest",
"name": "Ingest document",
"inputs": {
"text": "{{ inputs.document }}",
"source": "{{ inputs.filename }}",
"chunkStrategy": "markdown",
"chunkSize": 512
}
}
tip

Use the markdown strategy for structured documents with headings to preserve section boundaries. If you need to inspect or filter chunks before embedding, use the chunk node first.

Embedding

embed

Generates a vector embedding for a piece of text using a Voyage AI embedding model.

How it works: Sends the input text to the Voyage AI embeddings API, which returns a high-dimensional vector representation. The vector captures the semantic meaning of the text and can be used for similarity comparisons, clustering, or storage.

Inputs

ParameterTypeRequiredDescription
textstringYesThe text to embed.
modelstringNoVoyage AI embedding model (default: voyage-3-large).
inputTypestringNoWhether this text is a document or a query. Affects embedding optimization.

Output

FieldTypeDescription
embeddingarrayThe vector embedding as an array of floating-point numbers.
modelstringThe model used for embedding.
dimensionsnumberNumber of dimensions in the embedding vector.

Example

{
"id": "get_vector",
"tool": "embed",
"name": "Embed the query",
"inputs": {
"text": "{{ inputs.question }}",
"inputType": "query"
}
}
tip

Set inputType to query for search queries and document for content being indexed. Embeddings from different models are not comparable: always use the same model for queries and documents.

similarity

Compares two texts semantically by embedding both and computing cosine similarity.

How it works: Embeds both input texts using the same Voyage AI model, then computes the cosine similarity between the two vectors. Returns a score from -1 (opposite meaning) to 1 (identical meaning).

Inputs

ParameterTypeRequiredDescription
text1stringYesThe first text to compare.
text2stringYesThe second text to compare.
modelstringNoVoyage AI embedding model to use for both texts.

Output

FieldTypeDescription
similaritynumberCosine similarity score between -1 and 1.
modelstringThe embedding model used.

Example

{
"id": "check_dup",
"tool": "similarity",
"name": "Check for duplicate",
"inputs": {
"text1": "{{ inputs.new_doc }}",
"text2": "{{ existing.output.text }}"
}
}
tip

Scores above 0.8 generally indicate high semantic similarity. Combine with a conditional node to branch based on similarity thresholds for deduplication workflows.

Processing

chunk

Splits text into smaller chunks using configurable strategies, without embedding. Useful for inspecting or filtering chunks before storage.

How it works: Takes raw text and splits it into chunks using one of five strategies: fixed (character count), sentence, paragraph, recursive (smart splitting), or markdown (heading-aware). Returns the chunks with metadata but does not embed or store them.

Inputs

ParameterTypeRequiredDescription
textstringYesThe text content to split into chunks.
strategystringNoChunking strategy: fixed, sentence, paragraph, recursive (default), or markdown.
sizenumberNoTarget chunk size in characters (default: 512).
overlapnumberNoOverlap between adjacent chunks in characters (default: 50).
sourcestringNoSource identifier attached to each chunk for tracking.

Output

FieldTypeDescription
chunksarrayArray of chunk objects, each with index, content, charCount, and metadata.
totalChunksnumberTotal number of chunks produced.
strategystringThe chunking strategy used.
avgChunkSizenumberAverage character count per chunk.

Example

{
"id": "split",
"tool": "chunk",
"name": "Chunk the document",
"inputs": {
"text": "{{ inputs.document_text }}",
"strategy": "markdown",
"size": 512,
"overlap": 50,
"source": "architecture-overview.md"
}
}
tip

Separating chunking from embedding (vs. using ingest) gives you more control. Combine with a filter node to remove boilerplate or short chunks before embedding.

aggregate

Runs a MongoDB aggregation pipeline for analytics, grouping, counting, and structured data queries.

How it works: Executes a MongoDB aggregation pipeline against the specified collection. Supports all standard aggregation stages ($match, $group, $sort, $project, $limit, etc.) for flexible data analysis beyond vector search.

Inputs

ParameterTypeRequiredDescription
pipelinearrayYesMongoDB aggregation pipeline stages as a JSON array.
collectionstringNoMongoDB collection to aggregate.
dbstringNoMongoDB database name.

Output

FieldTypeDescription
resultsarrayArray of aggregation result documents.
countnumberNumber of result documents.
durationMsnumberExecution time in milliseconds.

Example

{
"id": "stats",
"tool": "aggregate",
"name": "Count docs by source",
"inputs": {
"pipeline": [
{ "$group": { "_id": "$metadata.source", "count": { "$sum": 1 } } },
{ "$sort": { "count": -1 } },
{ "$limit": 10 }
]
}
}
tip

Use for analytics that vector search cannot express: document counts by source, date-range filtering, metadata grouping. The pipeline is read-only by default.

Control Flow

merge

Combines outputs from multiple workflow steps into a single array.

How it works: Takes references to outputs from previous steps and merges them into one consolidated array. Supports concatenation (append all), interleaving (alternate items), and unique (deduplicate) strategies.

Inputs

ParameterTypeRequiredDescription
sourcesarrayYesArray of step output references to merge (e.g., ["step1.output", "step2.output"]).
strategystringNoMerge strategy: concat (default), interleave, or unique.

Output

FieldTypeDescription
mergedarrayThe combined array of items from all sources.
countnumberTotal number of items in the merged result.

Example

{
"id": "combine",
"tool": "merge",
"name": "Combine results",
"inputs": {
"sources": ["{{ search_a.output.results }}", "{{ search_b.output.results }}"],
"strategy": "unique"
}
}
tip

Use the unique strategy to deduplicate results from multiple search queries. The interleave strategy alternates items from each source, useful for balanced sampling.

filter

Filters an array of items based on a condition expression, keeping only items that match.

How it works: Iterates over an input array and evaluates the condition expression for each item. Items where the condition evaluates to true are kept; others are discarded. The expression has access to each item via the item variable.

Inputs

ParameterTypeRequiredDescription
inputstringYesReference to an array from a previous step.
conditionstringYesExpression evaluated per item. Use item to reference the current element (e.g., item.score > 0.5).

Output

FieldTypeDescription
itemsarrayArray of items that passed the filter condition.
countnumberNumber of items that passed.
removednumberNumber of items that were filtered out.

Example

{
"id": "quality_filter",
"tool": "filter",
"name": "Keep high-relevance results",
"inputs": {
"input": "{{ search.output.results }}",
"condition": "item.score > 0.7"
}
}
tip

Combine numeric and string conditions: item.metadata.type === 'api-doc' && item.score > 0.5. Use after a chunk node to remove boilerplate: item.charCount > 100.

transform

Maps each item in an array through a transformation expression, producing a new array.

How it works: Iterates over an input array and evaluates the expression for each item, collecting the results into a new array. The expression has access to each item via the item variable and can extract fields, compute values, or reshape data.

Inputs

ParameterTypeRequiredDescription
inputstringYesReference to an array from a previous step.
expressionstringYesExpression evaluated per item (e.g., item.text or { title: item.metadata.title, score: item.score }).

Output

FieldTypeDescription
itemsarrayArray of transformed items.
countnumberNumber of items in the result.

Example

{
"id": "extract_text",
"tool": "transform",
"name": "Extract text fields",
"inputs": {
"input": "{{ search.output.results }}",
"expression": "item.text"
}
}
tip

Use before a rerank node to prepare document strings from complex objects. Reshape objects to keep only relevant fields: { text: item.text, source: item.source }.

conditional

Branches workflow execution based on a condition. Routes to different paths depending on whether the condition is true or false.

How it works: Evaluates a condition expression against the workflow context. If true, enables the steps listed in the then branch. If false, enables the steps in the else branch (if provided). Steps in the non-taken branch are skipped. Renders as a diamond shape on the canvas to indicate a decision point.

Inputs

ParameterTypeRequiredDescription
conditionstringYesTemplate expression that resolves to a boolean.
thenarrayYesArray of step IDs to enable when condition is true.
elsearrayNoArray of step IDs to enable when condition is false.

Output

FieldTypeDescription
conditionResultbooleanThe evaluated condition result.
branchTakenstringthen or else, indicating which branch was activated.
enabledStepsarrayList of step IDs that were enabled.

Example

{
"id": "check_results",
"tool": "conditional",
"name": "Any results found?",
"inputs": {
"condition": "{{ primary_search.output.results.length > 0 }}",
"then": ["format_results"],
"else": ["fallback_search", "format_fallback"]
}
}
tip

Steps referenced in then and else must exist in the workflow's steps array. The conditional does not define steps inline; it references existing steps by ID. Use to implement fallback patterns or branch on similarity thresholds.

loop

Iterates over an array, executing a sub-step for each item. Collects all results into an output array.

How it works: Resolves the items expression to an array, then sequentially executes the inline sub-step for each element. Each iteration has access to the current item via the variable name specified in as. Results accumulate into an output array. A safety limit prevents runaway loops.

Inputs

ParameterTypeRequiredDescription
itemsstringYesTemplate reference resolving to an array.
asstringYesVariable name for the current item, accessible in the sub-step.
stepobjectYesInline step definition executed per item (same schema as a regular step, minus the id).
maxIterationsnumberNoSafety limit to prevent runaway loops (default: 100).

Output

FieldTypeDescription
iterationsnumberNumber of iterations completed.
resultsarrayArray of sub-step outputs, one per iteration.
errorsarrayErrors from failed iterations (if continueOnError is true).

Example

{
"id": "process_each",
"tool": "loop",
"name": "Process each result",
"inputs": {
"items": "{{ search.output.results }}",
"as": "doc",
"step": {
"tool": "similarity",
"inputs": {
"text1": "{{ doc.content }}",
"text2": "{{ inputs.reference_text }}"
}
},
"maxIterations": 50
}
}
tip

Iterations run sequentially, not in parallel, to avoid API rate limits. Set maxIterations to a reasonable limit for your use case to prevent unexpected costs.

template

Composes a text string from multiple step outputs using template interpolation.

How it works: Resolves all {{ }} template references in the text against the workflow context (previous step outputs, workflow inputs). Produces a single composed text output. Useful for assembling complex prompts before a generate step.

Inputs

ParameterTypeRequiredDescription
textstringYesTemplate string with {{ }} references to step outputs and workflow inputs.

Output

FieldTypeDescription
textstringThe resolved text with all template references replaced.
charCountnumberCharacter count of the resolved text.
referencedStepsarrayList of step IDs referenced in the template.

Example

{
"id": "build_prompt",
"tool": "template",
"name": "Compose LLM context",
"inputs": {
"text": "## Search Results\n\n{{ search.output.results }}\n\n## Document Stats\n\nTotal documents: {{ stats.output.count }}\n\n## User Question\n\n{{ inputs.query }}"
}
}
tip

Use before a generate node to assemble context from multiple sources into a single prompt. Template references use the syntax {{ stepId.output.field }}.

Generation

generate

Generates text using an LLM (Large Language Model), optionally with retrieved context for grounded responses.

How it works: Sends a prompt to the configured LLM provider (OpenAI, Anthropic, or Ollama) along with optional context text. The LLM generates a response based on the prompt and context. This is the generation step in a RAG pipeline.

Inputs

ParameterTypeRequiredDescription
promptstringYesThe instruction or question for the LLM.
contextstringNoAdditional context text injected into the LLM prompt (e.g., from a search or template step).

Output

FieldTypeDescription
textstringThe generated response text.
modelstringThe LLM model used.
providerstringThe LLM provider (openai, anthropic, ollama).

Example

{
"id": "answer",
"tool": "generate",
"name": "Generate answer",
"inputs": {
"prompt": "Answer the user's question based on the context provided.",
"context": "{{ build_prompt.output.text }}"
}
}
tip

Pair with a query or search node to build a complete RAG pipeline: retrieve context, then generate an answer. The LLM provider and model are configured in your project settings, not per-node.

Integration

http

Makes an outbound HTTP request to an external API. The extensibility node for integrating with any HTTP-accessible service.

How it works: Sends an HTTP request to the specified URL with configurable method, headers, body, and timeout. Returns the response status, headers, and body. Supports JSON and text response types. Does not follow redirects by default for security.

Inputs

ParameterTypeRequiredDescription
urlstringYesThe request URL. Supports template resolution for dynamic URLs.
methodstringNoHTTP method: GET (default), POST, PUT, PATCH, or DELETE.
headersobjectNoRequest headers as key-value pairs.
bodyobjectNoRequest body. Objects are JSON-serialized automatically.
timeoutnumberNoRequest timeout in milliseconds (default: 30000).

Output

FieldTypeDescription
statusnumberHTTP response status code (e.g., 200, 404).
statusTextstringHTTP status text (e.g., "OK", "Not Found").
headersobjectResponse headers as key-value pairs.
bodyobjectParsed response body (JSON or text depending on responseType).
durationMsnumberRequest duration in milliseconds.

Example

{
"id": "notify",
"tool": "http",
"name": "Send Slack notification",
"inputs": {
"url": "https://hooks.slack.com/services/T00/B00/xxxxx",
"method": "POST",
"headers": { "Content-Type": "application/json" },
"body": {
"text": "Ingested {{ ingest_step.output.chunksCreated }} chunks"
},
"timeout": 10000
}
}
tip

Set continueOnError: true if the HTTP call is optional and should not block the workflow. Response size is limited to 5MB.

Management

collections

Lists MongoDB collections in a database, showing document counts and vector index information.

How it works: Connects to the specified MongoDB database and enumerates all collections, including their document counts and any vector search indexes configured.

Inputs

ParameterTypeRequiredDescription
dbstringNoMongoDB database name. Uses project default if omitted.

Output

FieldTypeDescription
collectionsarrayArray of collection objects with name, documentCount, and indexes.

Example

{
"id": "list_collections",
"tool": "collections",
"name": "List available collections",
"inputs": { "db": "myapp" }
}
tip

Use at the start of a workflow to discover what data is available before running queries.

models

Lists available Voyage AI models with their capabilities, benchmarks, and pricing.

How it works: Retrieves the catalog of Voyage AI models, filtered by category. Returns model details including supported dimensions, context length, and per-token pricing.

Inputs

ParameterTypeRequiredDescription
categorystringNoFilter by category: embedding, rerank, or all (default).

Output

FieldTypeDescription
modelsarrayArray of model objects with name, type, dimensions, maxTokens, and pricing.

Example

{
"id": "available_models",
"tool": "models",
"name": "List reranking models",
"inputs": { "category": "rerank" }
}
tip

Use to programmatically select the best model based on your requirements.

Utility

estimate

Estimates costs for Voyage AI embedding and query operations at various scales.

How it works: Calculates projected costs based on the number of documents to embed, queries per month, and time horizon. Uses current Voyage AI pricing to provide detailed cost breakdowns.

Inputs

ParameterTypeRequiredDescription
docsnumberYesNumber of documents to embed.
queriesnumberNoNumber of queries per month (default: 0).
monthsnumberNoTime horizon in months (default: 12).

Output

FieldTypeDescription
embeddingobjectEmbedding cost breakdown with per-document and total costs.
queryingobjectQuery cost breakdown with per-query and monthly costs.
totalobjectTotal projected cost over the time horizon.

Example

{
"id": "cost_check",
"tool": "estimate",
"name": "Estimate ingestion cost",
"inputs": {
"docs": 10000,
"queries": 500,
"months": 6
}
}
tip

Use before large ingestion jobs to understand the cost impact. Factor in both embedding (one-time) and querying (ongoing) costs.

explain

Provides a detailed explanation of a Voyage AI or vector search concept.

How it works: Looks up the specified topic in the built-in knowledge base and returns a structured explanation with key points, examples, and related resources.

Inputs

ParameterTypeRequiredDescription
topicstringYesThe concept or topic to explain (e.g., embeddings, reranking, cosine-similarity).

Output

FieldTypeDescription
titlestringThe topic title.
contentstringDetailed explanation text.
keyPointsarrayKey takeaways as bullet points.

Example

{
"id": "learn",
"tool": "explain",
"name": "Explain reranking",
"inputs": { "topic": "reranking" }
}
tip

Use the topics node first to discover available topics, then explain to get the full details.

topics

Lists available educational topics that can be explored with the explain node.

How it works: Returns the catalog of available topics with summaries. Optionally filters by a search term to find relevant topics.

Inputs

ParameterTypeRequiredDescription
searchstringNoOptional search term to filter topics by name or description.

Output

FieldTypeDescription
topicsarrayArray of topic objects with id, title, and summary.

Example

{
"id": "find_topics",
"tool": "topics",
"name": "List embedding topics",
"inputs": { "search": "embedding" }
}
tip

Omit the search parameter to list all available topics. Combine with a loop node to generate explanations for multiple topics in sequence.

Quick Reference

NodeCategoryPurpose
queryRetrievalFull RAG query with embedding + search + rerank
searchRetrievalRaw vector similarity search
rerankRetrievalNeural reranking of document candidates
ingestRetrievalChunk, embed, and store text
embedEmbeddingGenerate vector embedding
similarityEmbeddingCompare two texts semantically
chunkProcessingSplit text into chunks without embedding
aggregateProcessingMongoDB aggregation pipeline
mergeControl FlowCombine outputs from multiple steps
filterControl FlowKeep items matching a condition
transformControl FlowMap items through an expression
conditionalControl FlowBranch based on a condition
loopControl FlowIterate over an array
templateControl FlowCompose text from multiple sources
generateGenerationLLM text generation
httpIntegrationExternal HTTP request
collectionsManagementList MongoDB collections
modelsManagementList Voyage AI models
estimateUtilityCost estimation
explainUtilityTopic explanation
topicsUtilityBrowse available topics