AI agents
AI Agents steps allow you to integrate AI agents directly into your Windmill flows. They provide an interface to connect with various AI providers and models, allowing you to process data, generate content, execute actions (Windmill scripts), and make decisions as part of your automated workflows.
Configuration
Provider selection
Choose from supported AI providers including OpenAI, Azure OpenAI, Anthropic, Mistral, DeepSeek, Google AI (Gemini), Groq, OpenRouter, Together AI, or Custom AI endpoints.
Resource configuration
Select or create an AI resource that contains your API credentials and endpoint configuration. Resources allow you to securely store and reuse AI provider credentials across multiple flows.
Model selection
Choose the specific model you want to use from your selected provider. Available models depend on your chosen provider and resource configuration.
Tools
AI Agents can be equipped with tools that extend their capabilities beyond text and image generation. Tools are Windmill scripts that the AI can call to perform specific actions or retrieve information. You can add tools from three sources:
- Inline scripts - Write custom tools directly within the flow
- Workspace scripts - Use existing scripts from your Windmill workspace
- Hub scripts - Leverage pre-built tools from the Windmill Hub
Each tool must have a unique name within the AI agent step and contain only letters, numbers, and underscores. It should be descriptive of the tool's function to help the AI understand when to use them.
When tools are configured, the AI agent can decide when and how to use them based on the user's request. It selects the most appropriate tool by name, and issues a tool call with JSON arguments that conform to the tool’s input schema. Windmill executes the underlying script and returns a JSON result, which is surfaced back to the model as a tool response message and is included in messages.
Input parameters
Required parameters
user_message (string)
The main input message or prompt that will be sent to the AI model. This can include static text, dynamic content from previous flow steps, or templated strings with variable substitution.
system_prompt (string)
The system prompt that defines the AI's role, behavior, and context. This helps guide the model's responses and ensures consistent behavior across interactions.
Optional parameters
output_type (text | image)
Specifies the type of output the AI should generate:
- text- Generate text responses (default).
- image- Generate image outputs (supported by OpenAI, Google AI (Gemini), and OpenRouter). Requires an S3 object storage to be configured at the workspace level.
streaming (optional)
Whether to stream the progress of the AI agent. The stream will contain json payloads separated by newlines. The payloads can be of the following types:
{
	"type": "token_delta", // sent everytime the AI generates a new token
	"content": "string",
}
{
	"type": "tool_call", // sent when the tool call is started
	"call_id": "string",
    "function_name": "string",
    "function_arguments": "string",
}
{
	"type": "tool_call_arguments", // sent all arguments have been received
	"call_id": "string",
    "function_name": "string",
    "function_arguments": "string",
}
{
	"type": "tool_execution", // sent when the tool job is started
	"call_id": "string",
    "function_name": "string",
}
{
	"type": "tool_result", // sent when the tool job is completed
	"call_id": "string",
    "function_name": "string",
    "result": "string",
    "success": true, // whether the tool job completed successfully
}
The final result of the step will contain an additional field wm_stream with the complete stream.
You can use the SSE stream webhooks to get the stream.
The new_result_stream field of an SSE event can contain multiple payloads at a time, make sure to split on line breaks.
Only complete payloads are streamed, you do not need to handle partial JSON.
user_images (optional)
Allows you to pass images as input to the AI model for analysis, processing, or context. The AI can analyze the image content and respond accordingly. Requires an S3 object storage to be configured at the workspace level.
max_completion_tokens (number)
Controls the maximum number of tokens the AI can generate in its response. This helps manage costs and ensures responses stay within desired length limits.
temperature (number)
Controls randomness in text generation:
- 0.0- Deterministic, focused responses
- 2.0- Maximum creativity and randomness
- Default values typically range from 0.1 to 1.0
output_schema (json-schema)
Define a JSON schema that the AI agent will follow for its response format. This ensures structured, predictable outputs that can be easily processed by subsequent flow steps.
Output
The AI Agent step returns an object with two keys:
output
Contains the content of the final response from the AI agent:
- Text output:
- When no output schema is specified: Returns the last message content, which can be a string or an array containing strings.
- When an output schema is specified: Returns the structured output conforming to the defined JSON schema.
 
- Image output:
- Returns the S3 object of the image
 
This is typically what you'll use in subsequent flow steps when you need the AI's final answer or result.
messages
Only in text output mode, contains the complete conversation history, including:
- User input messages
- Assistant intermediate outputs
- Tool calls made by the AI
- Tool execution results
The messages array provides full visibility into the AI's reasoning process and tool usage, which can be useful for debugging, logging, or understanding how the AI reached its conclusion.
wm_stream (optional)
Only included in streaming mode, contains the complete stream.
Debugging

Flow-level visualization
Tool calls are displayed directly on the flow graph as separate nodes connected to the AI Agent step. You can click on these tool call nodes to view detailed execution information, including inputs, outputs, and execution logs.
Detailed logging
In the logging panel of the AI Agent step, you can see the comprehensive logging:
- All input parameters passed to the AI agent
- Tool calls made by the AI, including which tools were selected and their inputs
- Individual tool execution results with full job details
- The final AI response and complete message history
This detailed view allows you to trace through the AI's decision-making process and verify that tools are being called correctly with the expected inputs and outputs.