Supported Models
All models available in Aira — built-in, BYOM with tool calling, cloud provider, and self-hosted — with capabilities matrix and recommendations.
Overview
Aira supports four categories of models:
- Built-in models — managed by Aira, ready to use immediately
- BYOM (Bring Your Own Model) — any OpenAI-compatible endpoint with tool calling support
- Cloud provider models — accessed through AWS Bedrock, Azure OpenAI, or Google Vertex AI
- Self-hosted models — your own deployments via vLLM, Ollama, or TGI
All models get the same tool-calling experience in Ask Aira chat and return structured output (decision, confidence, key factors, reasoning) in consensus cases.
Built-in Models
These models are available out of the box. Use Provider Credentials to bring your own API keys, or let Aira manage keys for you.
Free tier model restrictions: Claude Opus (4.8, 4.7, 4.6), GPT-5.5, and o3 are available on Pro plans and above. Free tier has access to all other models including Sonnet, Haiku, GPT-5.4, Gemini, DeepSeek, Grok, and more. See Billing for details.
Anthropic
| Model ID | Display Name | Best For |
|---|---|---|
claude-fable-5 | Claude Fable 5 | Most capable model ever released — state-of-the-art on nearly all benchmarks, exceptional autonomous agents |
claude-opus-4-8 | Claude Opus 4.8 | Strong flagship — sharp judgement, long autonomous runs |
claude-opus-4-7 | Claude Opus 4.7 | Proven flagship — agentic coding |
claude-opus-4-6 | Claude Opus 4.6 | Previous flagship — proven in production |
claude-sonnet-4-6 | Claude Sonnet 4.6 | Best value — strong reasoning at lower cost |
claude-haiku-4-5 | Claude Haiku 4.5 | Fastest and cheapest — high-volume cases |
OpenAI
| Model ID | Display Name | Best For |
|---|---|---|
gpt-5.5 | GPT-5.5 | Latest flagship — 1M context, agentic multi-step |
gpt-5.4 | GPT-5.4 | Strong all-around — native structured outputs |
gpt-5.2 | GPT-5.2 | Battle-tested — proven in production workloads |
gpt-5-mini | GPT-5 Mini | Cost-effective for simpler tasks |
o3 | OpenAI o3 | Complex multi-step reasoning and analysis |
| Model ID | Display Name | Best For |
|---|---|---|
gemini-3.5-flash | Gemini 3.5 Flash | Fastest frontier model — 4x speed, GA with SLA |
gemini-3.1-pro | Gemini 3.1 Pro | Strong analytical capabilities, large context |
gemini-3.1-flash-lite | Gemini 3.1 Flash Lite | Lightweight, cost-efficient |
gemma-4-31b | Gemma 4 31B | Open-weight (Apache 2.0), frontier per parameter |
gemma-4-26b-moe | Gemma 4 26B MoE | Open-weight (Apache 2.0), fast inference |
DeepSeek
| Model ID | Display Name | Best For |
|---|---|---|
deepseek-v4-pro | DeepSeek V4 Pro | 1.6T MoE, MIT license, lowest cost frontier model |
xAI
| Model ID | Display Name | Best For |
|---|---|---|
grok-4.3 | Grok 4.3 | 1M context, native video input, aggressive pricing |
Mistral
| Model ID | Display Name | Best For |
|---|---|---|
devstral-2 | Devstral 2 | 123B coding specialist, open source (MIT) |
Moonshot (Kimi)
| Model ID | Display Name | Best For |
|---|---|---|
kimi-k2.6 | Kimi K2.6 | 1T MoE, 262K context, open weights, agent swarm |
Alibaba (Qwen)
| Model ID | Display Name | Best For |
|---|---|---|
qwen3.7-max | Qwen 3.7 Max | Agent-first, 1M context, half the cost of Opus |
BYOM — Bring Your Own Model
Register any model accessible via an OpenAI-compatible /v1/chat/completions endpoint. This includes hosted API providers, self-hosted models via vLLM/Ollama/TGI, or any custom endpoint.
Aira uses the standard OpenAI function-calling format for tool use. Models that support it get the same full tool-calling experience as built-in models — including multi-step reasoning with all Ask Aira tools.
Verified Models
These open-weight models have been verified to work with Aira's tool calling and structured output:
| Model | Tool Calling | Structured Output | Notes |
|---|---|---|---|
| Gemma 4 31B | Full support | Excellent | Apache 2.0, native function calling, 256K context |
| Gemma 4 26B MoE | Full support | Excellent | Apache 2.0, 3.8B active params, fastest in class |
| Qwen 3.7 Max | Full support | Excellent | Agent-first, 1M context, OpenAI-compatible |
| Qwen 3.5 | Full support | Excellent | Top-tier agent benchmarks, Apache 2.0 |
| Kimi K2.6 | Full support | Excellent | 1T MoE, 262K context, open weights |
| DeepSeek V4 Pro | Full support | Excellent | 1.6T MoE, MIT license, lowest cost |
| DeepSeek V3.2 | Full support | Strong | Cost-efficient, MIT license |
| Llama 4 Maverick | Full support | Strong | MoE architecture, strong reasoning |
| Llama 3.3 70B | Full support | Strong | Widely available, very reliable |
| Devstral 2 | Full support | Strong | 123B coding specialist, open source |
| Mistral Large | Full support | Strong | Strong multilingual support |
Any OpenAI-compatible endpoint serving these models will work — whether self-hosted or via a hosted provider.
How to Register
curl -X POST https://api.airaproof.com/api/v1/models/custom \
-H "Authorization: Bearer aira_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"name": "Llama 3.3 70B",
"model_id": "llama-3.3-70b",
"endpoint_url": "https://your-endpoint/v1/chat/completions",
"auth_header": "Bearer your-api-key",
"timeout_ms": 30000
}'Or register from the dashboard: Models → Register Model → Custom Endpoint.
Models without tool calling support fall back to context-stuffing mode — Aira pre-fetches all available data and includes it in the prompt. This works but produces less precise answers than full tool calling.
Will my custom model work?
Yes, if it meets one requirement: your endpoint accepts OpenAI-compatible POST /v1/chat/completions requests with a messages array and returns a JSON response with the content at choices[0].message.content.
This covers:
- vLLM — native OpenAI compatibility
- Ollama — OpenAI compatibility at
/v1/chat/completions - TGI — OpenAI-compatible mode
- Any hosted provider — Together AI, Fireworks, Groq, Replicate, etc.
- Your own fine-tuned model — as long as the serving layer is OpenAI-compatible
Aira sends a system prompt asking the model to return a JSON decision with decision, confidence, key_factors, and reasoning fields. Models that follow instructions well (7B+) handle this reliably. For smaller models or models that struggle with structured output, Aira validates the response and retries once.
What if my endpoint has a different format? Use the response_schema.content_path field to tell Aira where to find the response text. For example, if your endpoint returns {"result": {"text": "..."}}, set content_path to "result.text".
What if I'm fine-tuning my own model? Aira doesn't require special training. Any instruction-following model works. For best results, ensure your model can output valid JSON when asked.
Cloud Provider Models
Access models through your existing cloud provider accounts. Useful for data residency compliance, enterprise agreements, or accessing models not available as built-in.
AWS Bedrock
{
"provider": "bedrock",
"credentials": {
"type": "aws",
"access_key_id": "AKIA...",
"secret_access_key": "...",
"region": "us-east-1"
}
}Available: Claude (Opus 4.8, 4.7, 4.6, Sonnet 4.6), Llama 3.3/4, Mistral Large.
Azure OpenAI
{
"provider": "azure",
"credentials": {
"type": "azure",
"endpoint": "https://your-resource.openai.azure.com",
"api_key": "...",
"api_version": "2024-10-21"
}
}Google Vertex AI
{
"provider": "vertex",
"credentials": {
"type": "vertex",
"project_id": "your-gcp-project",
"region": "us-central1"
}
}Configure cloud provider credentials from Models → Providers in your dashboard, or via the Provider Credentials API.
Self-Hosted Models
Host models on your own infrastructure using vLLM, Ollama, or TGI. Register them as custom models — they get the same tool-calling support as any BYOM model.
Recommended for Self-Hosting
Frontier (multi-GPU)
| Model | Active Params | Total | License | vLLM Tool Parser |
|---|---|---|---|---|
| DeepSeek V4 Pro | 49B | 1.6T MoE | MIT | deepseek |
| Kimi K2.6 | 32B | 1T MoE | MIT | hermes |
| Command A+ | 25B | 218B MoE | Apache 2.0 | hermes |
| Mistral Medium 3.5 | 128B | 128B Dense | Apache 2.0 | mistral |
| Mistral Large 3 | 41B | 675B MoE | Apache 2.0 | mistral |
| Llama 4 Maverick | 17B | 400B MoE | Llama Community | llama4_pythonic |
| Qwen 3.5 | 17B | 397B MoE | Apache 2.0 | hermes |
General purpose (single A100/H100)
| Model | Parameters | License | vLLM Tool Parser |
|---|---|---|---|
| Gemma 4 31B | 31B Dense | Apache 2.0 | hermes |
| Qwen 3.6 27B | 27B Dense | Apache 2.0 | hermes |
| Gemma 4 26B MoE | 3.8B active | Apache 2.0 | hermes |
| Llama 4 Scout | 17B active | Llama Community | llama4_pythonic |
| Llama 3.3 70B | 70B | Llama Community | llama3_json |
| DeepSeek V4 Flash | 13B active | MIT | deepseek |
| Mistral Small 4 | 24B | Apache 2.0 | mistral |
Reasoning
| Model | Parameters | License | Notes |
|---|---|---|---|
| DeepSeek R1 | 37B active (671B) | MIT | Full reasoning model |
| DeepSeek R1 32B | 32B (distilled) | MIT | Fits single GPU |
| QwQ 32B | 32B | Apache 2.0 | Chain-of-thought, competitive with o1-mini |
| Phi-4 Reasoning | 14B | MIT | Best reasoning under 15B |
Code
| Model | Parameters | License | Notes |
|---|---|---|---|
| Qwen3 Coder Next | 3B active (80B) | Apache 2.0 | Top agentic coding benchmark |
| Devstral 2 | 24B | Apache 2.0 | 72% SWE-Bench Verified |
| Qwen2.5 Coder 32B | 32B | Apache 2.0 | Rivals GPT-4o on code |
Efficient (single consumer GPU)
| Model | Parameters | License | Notes |
|---|---|---|---|
| Phi-4 | 14B | MIT | Best reasoning at size |
| Gemma 4 12B | 12B | Apache 2.0 | Multimodal, laptop-friendly |
| Qwen3 8B | 8B | Apache 2.0 | Strong multilingual |
| Llama 3.1 8B | 8B | Llama Community | Workhorse baseline |
| Gemma 4 E4B | 4B | Apache 2.0 | Edge deployment |
vLLM Setup (Recommended)
vLLM supports OpenAI-compatible tool calling with constrained decoding for guaranteed valid JSON:
vllm serve meta-llama/Llama-4-Maverick-17B-128E-Instruct \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 4 \
--enable-auto-tool-choice \
--tool-call-parser llama4_pythonicThen register:
curl -X POST https://api.airaproof.com/api/v1/models/custom \
-H "Authorization: Bearer aira_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"name": "Llama 4 Maverick (self-hosted)",
"model_id": "llama-4-maverick",
"endpoint_url": "https://your-gpu-server:8000/v1/chat/completions",
"timeout_ms": 60000
}'Ollama Setup
For development and smaller deployments:
ollama pull llama3.3:70b
ollama serveRegister with the Ollama endpoint:
curl -X POST https://api.airaproof.com/api/v1/models/custom \
-H "Authorization: Bearer aira_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"name": "Llama 3.3 70B (Ollama)",
"model_id": "llama-3.3-70b",
"endpoint_url": "http://your-server:11434/v1/chat/completions",
"timeout_ms": 60000
}'Always test your model endpoint after registering. Models that cannot return valid structured output will produce model_error results in cases.
Capabilities Matrix
| Capability | Built-in | BYOM (Tool Calling) | BYOM (No Tools) | Cloud Provider |
|---|---|---|---|---|
| Ask Aira — full tool calling | Yes | Yes | Context-stuffing fallback | Yes |
| Consensus cases — structured output | Yes | Yes | Best-effort | Yes |
| Multi-step reasoning | Yes | Model-dependent | No | Yes |
| Guaranteed JSON schema | Yes | Provider-dependent | No | Yes |
| Streaming | Yes | No (planned) | No | Yes |
Structured Output by Provider
Each provider handles structured output differently. Aira abstracts this so all models return the same decision schema.
| Provider | Mechanism | Reliability |
|---|---|---|
| OpenAI | Native structured outputs (response_format) | Guaranteed valid JSON |
| Anthropic | Tool-use with forced tool call | Guaranteed valid JSON |
response_mime_type + schema | Guaranteed valid JSON | |
| Self-hosted (vLLM) | Constrained decoding (guided_json) + tool calling | Guaranteed valid JSON |
| Self-hosted (Ollama) | JSON mode + prompt engineering | Best-effort (validated on receipt) |
| BYOM (with tool calling) | OpenAI function-calling format + response validation | High reliability |
| BYOM (no tool calling) | Prompt engineering + response validation | Best-effort (validated on receipt) |
For best-effort providers, Aira validates the response and retries once if the output is malformed. If the retry also fails, the model returns a model_error result.
Model Selection Recommendations
High-Stakes Decisions (Finance, Healthcare, Legal)
Use 3+ models from different providers for maximum independence:
{
"models": ["claude-opus-4-8", "gpt-5.5", "gemini-3.5-flash"]
}PR Code Review (Consensus)
Two frontier models for high-confidence findings:
{
"models": ["claude-sonnet-4-6", "claude-opus-4-8"]
}Or cross-provider consensus:
{
"models": ["claude-sonnet-4-6", "gpt-5.5", "deepseek-v4-pro"]
}Cost-Optimized
Cheaper models for high-volume, lower-risk decisions:
{
"models": ["claude-haiku-4-5", "gpt-5-mini", "deepseek-v4-pro"]
}Maximum Provider Diversity
Mix commercial and open-source:
{
"models": ["gpt-5.5", "claude-sonnet-4-6", "deepseek-v4-pro"]
}Open-Source Only (Self-Hosted)
Full data sovereignty:
{
"models": ["custom:gemma-4-31b", "custom:llama-4-maverick", "deepseek-v4-pro"]
}Cases and consensus policies accept 2-3 models. We recommend 3 models from at least 2 different providers for meaningful consensus.
EU AI Act mapping
Article-by-article mapping from the EU AI Act (Regulation 2024/1689) to the Aira capability that satisfies each requirement. Includes the specific code, config, and API call for every article.
Provider-Specific Prompting
How Aira optimizes prompts and structured output enforcement for each AI provider.