Supported Models
All models available in Aira — built-in, cloud provider, and self-hosted — with setup instructions and recommendations.
Overview
Aira supports three categories of models for consensus cases:
- Built-in models — managed by Aira, ready to use immediately
- Cloud provider models — accessed through AWS Bedrock, Azure OpenAI, or Google Vertex AI
- Self-hosted models — your own deployments via vLLM, Ollama, or TGI
All models return structured output (decision, confidence, key factors, reasoning) regardless of provider.
Built-in Models
These models are available out of the box. Use BYOK to bring your own API keys, or let Aira manage keys for you.
OpenAI
| Model ID | Display Name | Best For |
|---|---|---|
gpt-5.4 | GPT-5.4 | Flagship model — highest accuracy, native structured outputs |
gpt-5.2 | GPT-5.2 | Proven stable — battle-tested in production workloads |
gpt-5-mini | GPT-5 Mini | Cost-optimized — fast responses at lower cost |
o3 | o3 Reasoning | Reasoning-focused — excels at complex, multi-step decisions |
Anthropic
| Model ID | Display Name | Best For |
|---|---|---|
claude-opus-4-6 | Claude Opus 4.6 | Highest capability — 1M token context, deep analysis |
claude-sonnet-4-6 | Claude Sonnet 4.6 | Best value — 1M token context, strong reasoning at lower cost |
claude-haiku-4-5 | Claude Haiku 4.5 | Fast and cost-effective — ideal for high-volume cases |
| Model ID | Display Name | Best For |
|---|---|---|
gemini-3.1-pro | Gemini 3.1 Pro | Latest Pro model — strong analytical capabilities |
gemini-3.1-flash-lite | Gemini 3.1 Flash Lite | Fast and cost-effective — good for latency-sensitive cases |
Cloud Provider Models
Access models through your existing cloud provider accounts. This is useful for compliance requirements (data residency), cost management (existing enterprise agreements), or accessing models not available as built-in.
AWS Bedrock
Configure Bedrock access to use Claude, Llama, and Mistral models through your AWS account.
{
"provider": "bedrock",
"config": {
"aws_access_key_id": "AKIA...",
"aws_secret_access_key": "...",
"aws_region": "us-east-1"
}
}Available models through Bedrock include Claude (Opus, Sonnet, Haiku), Llama 3.3/4, and Mistral Large.
Azure OpenAI
Use GPT-5.x models through your Azure OpenAI deployment.
{
"provider": "azure",
"config": {
"azure_endpoint": "https://your-resource.openai.azure.com",
"azure_deployment": "gpt-5-4",
"api_version": "2026-01-01-preview",
"api_key": "..."
}
}Google Vertex AI
Access Claude and Gemini models through Google Cloud.
{
"provider": "vertex",
"config": {
"project_id": "your-gcp-project",
"region": "us-central1"
}
}Cloud provider configuration is set per API key using the BYOK endpoint. Each API key can have different provider configurations.
Self-Hosted Models
Register self-hosted models served via vLLM, Ollama, or TGI using the Custom Models API. Self-hosted models have dedicated support with per-model system prompts and constrained decoding for guaranteed schema compliance.
Supported Self-Hosted Models
| Model | Parameters | License | Notes |
|---|---|---|---|
| Llama 3.3 70B | 70B | Llama Community License | Strong general-purpose reasoning |
| Llama 4 Scout | 17B active (MoE) | Llama Community License | Efficient MoE architecture |
| Llama 4 Maverick | 17B active (MoE) | Llama Community License | Higher quality MoE variant |
| Mistral Large 3 | 41B active (MoE) | Apache 2.0 | Strong multilingual support |
| Qwen 3.5 | 397B (MoE) | Apache 2.0 | Largest open-weight MoE |
| DeepSeek V3.2 | 37B active (MoE) | MIT | Cost-efficient reasoning |
vLLM Setup
vLLM is the recommended serving engine for self-hosted models. It supports guided_json for constrained decoding, which guarantees valid structured output.
# Start vLLM with guided decoding support
vllm serve meta-llama/Llama-4-Scout-17B-16E-Instruct \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 4 \
--enable-auto-tool-choiceThen register the model in Aira:
curl -X POST https://api.airaproof.com/api/v1/models/custom \
-H "Authorization: Bearer aira_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"name": "Llama 4 Scout (self-hosted)",
"model_id": "llama-4-scout",
"endpoint_url": "https://your-gpu-server:8000/v1/chat/completions",
"timeout_ms": 60000
}'Ollama Setup
For smaller deployments or development environments:
# Pull and run a model
ollama pull llama3.3:70b
ollama serveRegister with the Ollama-compatible endpoint:
curl -X POST https://api.airaproof.com/api/v1/models/custom \
-H "Authorization: Bearer aira_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"name": "Llama 3.3 70B (Ollama)",
"model_id": "llama-3.3-70b-ollama",
"endpoint_url": "http://your-server:11434/v1/chat/completions",
"timeout_ms": 60000
}'Always test your model endpoint after registering it. Models that cannot return valid structured output will produce model_error results in cases.
Structured Output by Provider
Each provider handles structured output differently. Aira abstracts this so all models return the same decision schema.
| Provider | Mechanism | Reliability |
|---|---|---|
| OpenAI | Native structured outputs (response_format) | Guaranteed valid JSON |
| Anthropic | Tool-use with forced tool call | Guaranteed valid JSON |
response_mime_type + schema | Guaranteed valid JSON | |
| Self-hosted (vLLM) | Constrained decoding (guided_json) | Guaranteed valid JSON |
| Self-hosted (Ollama) | JSON mode + prompt engineering | Best-effort (validated on receipt) |
| Custom endpoints | Prompt engineering + response validation | Best-effort (validated on receipt) |
For best-effort providers, Aira validates the response and retries once if the output is malformed. If the retry also fails, the model returns a model_error result.
Model Selection Recommendations
High-Stakes Decisions (Finance, Healthcare, Legal)
Use 3-5 models from different providers for maximum independence:
{
"models": ["gpt-5.4", "claude-opus-4-6", "gemini-3.1-pro"]
}Consider adding o3 when the decision involves complex multi-step reasoning.
Cost-Optimized Cases
Use faster, cheaper models for high-volume, lower-risk decisions:
{
"models": ["gpt-5-mini", "claude-haiku-4-5", "gemini-3.1-flash-lite"]
}Maximum Coverage
Mix commercial and self-hosted models for provider diversity:
{
"models": ["gpt-5.4", "claude-sonnet-4-6", "custom:llama-4-scout"]
}Reasoning-Heavy Queries
Include reasoning-focused models for complex analytical decisions:
{
"models": ["o3", "claude-opus-4-6", "gpt-5.4"]
}Cases require a minimum of 2 models. For production use, 3 models from at least 2 different providers is recommended to ensure meaningful consensus.