Supported Models
All models available in Aira — built-in, BYOM with tool calling, cloud provider, and self-hosted — with capabilities matrix and recommendations.
Overview
Aira supports four categories of models:
- Built-in models — managed by Aira, ready to use immediately
- BYOM (Bring Your Own Model) — any OpenAI-compatible endpoint with tool calling support
- Cloud provider models — accessed through AWS Bedrock, Azure OpenAI, or Google Vertex AI
- Self-hosted models — your own deployments via vLLM, Ollama, or TGI
All models get the same tool-calling experience in Ask Aira chat and return structured output (decision, confidence, key factors, reasoning) in consensus cases.
Built-in Models
These models are available out of the box. Use Provider Credentials to bring your own API keys, or let Aira manage keys for you.
Anthropic
| Model ID | Display Name | Tool Calling | Best For |
|---|---|---|---|
claude-opus-4-6 | Claude Opus 4.6 | Native | Highest capability — 1M token context, deep legal analysis |
claude-sonnet-4-6 | Claude Sonnet 4.6 | Native | Best value — strong reasoning at lower cost |
claude-haiku-4-5 | Claude Haiku 4.5 | Native | Fast and cost-effective — high-volume cases |
OpenAI
| Model ID | Display Name | Tool Calling | Best For |
|---|---|---|---|
gpt-5.4 | GPT-5.4 | Native | Flagship — highest accuracy, native structured outputs |
gpt-5.2 | GPT-5.2 | Native | Battle-tested — proven in production workloads |
o3 | o3 Reasoning | Native | Complex multi-step reasoning and analysis |
| Model ID | Display Name | Tool Calling | Best For |
|---|---|---|---|
gemini-3.1-pro | Gemini 3.1 Pro | Native | Strong analytical capabilities, large context |
gemma-4-31b | Gemma 4 31B | Native | Open-weight (Apache 2.0), frontier intelligence per parameter |
gemma-4-26b-moe | Gemma 4 26B MoE | Native | Open-weight (Apache 2.0), fast inference (3.8B active params) |
BYOM — Bring Your Own Model
Register any model accessible via an OpenAI-compatible /v1/chat/completions endpoint. This includes hosted API providers, self-hosted models via vLLM/Ollama/TGI, or any custom endpoint.
Aira uses the standard OpenAI function-calling format for tool use. Models that support it get the same full tool-calling experience as built-in models — including multi-step reasoning with all Ask Aira tools.
Verified Models
These open-weight models have been verified to work with Aira's tool calling and structured output:
| Model | Tool Calling | Structured Output | Notes |
|---|---|---|---|
| Gemma 4 31B | Full support | Excellent | Apache 2.0, native function calling, 256K context |
| Gemma 4 26B MoE | Full support | Excellent | Apache 2.0, 3.8B active params, fastest in class |
| Qwen 3.5 | Full support | Excellent | Top-tier agent benchmarks, Apache 2.0 |
| Llama 4 Maverick | Full support | Strong | MoE architecture, strong reasoning |
| Llama 3.3 70B | Full support | Strong | Widely available, very reliable |
| Mistral Large | Full support | Strong | Strong multilingual support |
| DeepSeek V3.2 | Full support | Strong | Cost-efficient, MIT license |
Any OpenAI-compatible endpoint serving these models will work — whether self-hosted or via a hosted provider.
How to Register
curl -X POST https://api.airaproof.com/api/v1/models/custom \
-H "Authorization: Bearer aira_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"name": "Llama 3.3 70B",
"model_id": "llama-3.3-70b",
"endpoint_url": "https://your-endpoint/v1/chat/completions",
"auth_header": "Bearer your-api-key",
"timeout_ms": 30000
}'Or register from the dashboard: Models → Register Model → Custom Endpoint.
Models without tool calling support fall back to context-stuffing mode — Aira pre-fetches all available data and includes it in the prompt. This works but produces less precise answers than full tool calling.
Cloud Provider Models
Access models through your existing cloud provider accounts. Useful for data residency compliance, enterprise agreements, or accessing models not available as built-in.
AWS Bedrock
{
"provider": "bedrock",
"credentials": {
"type": "aws",
"access_key_id": "AKIA...",
"secret_access_key": "...",
"region": "us-east-1"
}
}Available: Claude (Opus, Sonnet, Haiku), Llama 3.3/4, Mistral Large.
Azure OpenAI
{
"provider": "azure",
"credentials": {
"type": "azure",
"endpoint": "https://your-resource.openai.azure.com",
"api_key": "...",
"api_version": "2024-10-21"
}
}Google Vertex AI
{
"provider": "vertex",
"credentials": {
"type": "vertex",
"project_id": "your-gcp-project",
"region": "us-central1"
}
}Configure cloud provider credentials from Models → Providers in your dashboard, or via the Provider Credentials API.
Self-Hosted Models
Host models on your own infrastructure using vLLM, Ollama, or TGI. Register them as custom models — they get the same tool-calling support as any BYOM model.
Recommended for Self-Hosting
| Model | Parameters | License | vLLM Tool Parser |
|---|---|---|---|
| Gemma 4 31B | 31B (Dense) | Apache 2.0 | hermes |
| Gemma 4 26B MoE | 3.8B active (MoE) | Apache 2.0 | hermes |
| Llama 4 Maverick | 17B active (128E MoE) | Llama Community | llama4_pythonic |
| Llama 4 Scout | 17B active (16E MoE) | Llama Community | llama4_pythonic |
| Llama 3.3 70B | 70B | Llama Community | llama3_json |
| Qwen 3.5 | 397B (MoE) | Apache 2.0 | hermes |
| Mistral Large 3 | 41B active (MoE) | Apache 2.0 | mistral |
| DeepSeek V3.2 | 37B active (MoE) | MIT | deepseek |
vLLM Setup (Recommended)
vLLM supports OpenAI-compatible tool calling with constrained decoding for guaranteed valid JSON:
vllm serve meta-llama/Llama-4-Maverick-17B-128E-Instruct \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 4 \
--enable-auto-tool-choice \
--tool-call-parser llama4_pythonicThen register:
curl -X POST https://api.airaproof.com/api/v1/models/custom \
-H "Authorization: Bearer aira_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"name": "Llama 4 Maverick (self-hosted)",
"model_id": "llama-4-maverick",
"endpoint_url": "https://your-gpu-server:8000/v1/chat/completions",
"timeout_ms": 60000
}'Ollama Setup
For development and smaller deployments:
ollama pull llama3.3:70b
ollama serveRegister with the Ollama endpoint:
curl -X POST https://api.airaproof.com/api/v1/models/custom \
-H "Authorization: Bearer aira_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"name": "Llama 3.3 70B (Ollama)",
"model_id": "llama-3.3-70b",
"endpoint_url": "http://your-server:11434/v1/chat/completions",
"timeout_ms": 60000
}'Always test your model endpoint after registering. Models that cannot return valid structured output will produce model_error results in cases.
Capabilities Matrix
| Capability | Built-in | BYOM (Tool Calling) | BYOM (No Tools) | Cloud Provider |
|---|---|---|---|---|
| Ask Aira — full tool calling | Yes | Yes | Context-stuffing fallback | Yes |
| Consensus cases — structured output | Yes | Yes | Best-effort | Yes |
| Multi-step reasoning | Yes | Model-dependent | No | Yes |
| Guaranteed JSON schema | Yes | Provider-dependent | No | Yes |
| Streaming | Yes | No (planned) | No | Yes |
Structured Output by Provider
Each provider handles structured output differently. Aira abstracts this so all models return the same decision schema.
| Provider | Mechanism | Reliability |
|---|---|---|
| OpenAI | Native structured outputs (response_format) | Guaranteed valid JSON |
| Anthropic | Tool-use with forced tool call | Guaranteed valid JSON |
response_mime_type + schema | Guaranteed valid JSON | |
| Self-hosted (vLLM) | Constrained decoding (guided_json) + tool calling | Guaranteed valid JSON |
| Self-hosted (Ollama) | JSON mode + prompt engineering | Best-effort (validated on receipt) |
| BYOM (with tool calling) | OpenAI function-calling format + response validation | High reliability |
| BYOM (no tool calling) | Prompt engineering + response validation | Best-effort (validated on receipt) |
For best-effort providers, Aira validates the response and retries once if the output is malformed. If the retry also fails, the model returns a model_error result.
Model Selection Recommendations
High-Stakes Decisions (Finance, Healthcare, Legal)
Use 3-5 models from different providers for maximum independence:
{
"models": ["claude-opus-4-6", "gpt-5.4", "gemini-3.1-pro"]
}Add o3 for complex multi-step reasoning decisions.
Cost-Optimized Cases
Use faster, cheaper models for high-volume, lower-risk decisions:
{
"models": ["claude-sonnet-4-6", "gpt-5.2", "claude-haiku-4-5"]
}Maximum Provider Diversity
Mix commercial and open-source models:
{
"models": ["gpt-5.4", "claude-sonnet-4-6", "custom:llama-4-maverick"]
}Open-Source Only (Self-Hosted)
For organizations that require full data sovereignty:
{
"models": ["custom:gemma-4-31b", "custom:llama-4-maverick", "custom:qwen-3.5"]
}Cases require a minimum of 2 models. For production use, 3 models from at least 2 different providers is recommended to ensure meaningful consensus.
EU AI Act mapping
Article-by-article mapping from the EU AI Act (Regulation 2024/1689) to the Aira capability that satisfies each requirement. Includes the specific code, config, and API call for every article.
Provider-Specific Prompting
How Aira optimizes prompts and structured output enforcement for each AI provider.