Supported Models

All models available in Aira — built-in, BYOM with tool calling, cloud provider, and self-hosted — with capabilities matrix and recommendations.

Overview

Aira supports four categories of models:

Built-in models — managed by Aira, ready to use immediately
BYOM (Bring Your Own Model) — any OpenAI-compatible endpoint with tool calling support
Cloud provider models — accessed through AWS Bedrock, Azure OpenAI, or Google Vertex AI
Self-hosted models — your own deployments via vLLM, Ollama, or TGI

All models get the same tool-calling experience in Ask Aira chat and return structured output (decision, confidence, key factors, reasoning) in consensus cases.

Built-in Models

These models are available out of the box. Use Provider Credentials to bring your own API keys, or let Aira manage keys for you.

Anthropic

Model ID	Display Name	Tool Calling	Best For
`claude-opus-4-6`	Claude Opus 4.6	Native	Highest capability — 1M token context, deep legal analysis
`claude-sonnet-4-6`	Claude Sonnet 4.6	Native	Best value — strong reasoning at lower cost
`claude-haiku-4-5`	Claude Haiku 4.5	Native	Fast and cost-effective — high-volume cases

OpenAI

Model ID	Display Name	Tool Calling	Best For
`gpt-5.4`	GPT-5.4	Native	Flagship — highest accuracy, native structured outputs
`gpt-5.2`	GPT-5.2	Native	Battle-tested — proven in production workloads
`o3`	o3 Reasoning	Native	Complex multi-step reasoning and analysis

Google

Model ID	Display Name	Tool Calling	Best For
`gemini-3.1-pro`	Gemini 3.1 Pro	Native	Strong analytical capabilities, large context
`gemma-4-31b`	Gemma 4 31B	Native	Open-weight (Apache 2.0), frontier intelligence per parameter
`gemma-4-26b-moe`	Gemma 4 26B MoE	Native	Open-weight (Apache 2.0), fast inference (3.8B active params)

BYOM — Bring Your Own Model

Register any model accessible via an OpenAI-compatible /v1/chat/completions endpoint. This includes hosted API providers, self-hosted models via vLLM/Ollama/TGI, or any custom endpoint.

Aira uses the standard OpenAI function-calling format for tool use. Models that support it get the same full tool-calling experience as built-in models — including multi-step reasoning with all Ask Aira tools.

Verified Models

These open-weight models have been verified to work with Aira's tool calling and structured output:

Model	Tool Calling	Structured Output	Notes
Gemma 4 31B	Full support	Excellent	Apache 2.0, native function calling, 256K context
Gemma 4 26B MoE	Full support	Excellent	Apache 2.0, 3.8B active params, fastest in class
Qwen 3.5	Full support	Excellent	Top-tier agent benchmarks, Apache 2.0
Llama 4 Maverick	Full support	Strong	MoE architecture, strong reasoning
Llama 3.3 70B	Full support	Strong	Widely available, very reliable
Mistral Large	Full support	Strong	Strong multilingual support
DeepSeek V3.2	Full support	Strong	Cost-efficient, MIT license

Any OpenAI-compatible endpoint serving these models will work — whether self-hosted or via a hosted provider.

How to Register

curl -X POST https://api.airaproof.com/api/v1/models/custom \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Llama 3.3 70B",
    "model_id": "llama-3.3-70b",
    "endpoint_url": "https://your-endpoint/v1/chat/completions",
    "auth_header": "Bearer your-api-key",
    "timeout_ms": 30000
  }'

Or register from the dashboard: Models → Register Model → Custom Endpoint.

Models without tool calling support fall back to context-stuffing mode — Aira pre-fetches all available data and includes it in the prompt. This works but produces less precise answers than full tool calling.

Cloud Provider Models

Access models through your existing cloud provider accounts. Useful for data residency compliance, enterprise agreements, or accessing models not available as built-in.

AWS Bedrock

{
  "provider": "bedrock",
  "credentials": {
    "type": "aws",
    "access_key_id": "AKIA...",
    "secret_access_key": "...",
    "region": "us-east-1"
  }
}

Available: Claude (Opus, Sonnet, Haiku), Llama 3.3/4, Mistral Large.

Azure OpenAI

{
  "provider": "azure",
  "credentials": {
    "type": "azure",
    "endpoint": "https://your-resource.openai.azure.com",
    "api_key": "...",
    "api_version": "2024-10-21"
  }
}

Google Vertex AI

{
  "provider": "vertex",
  "credentials": {
    "type": "vertex",
    "project_id": "your-gcp-project",
    "region": "us-central1"
  }
}

Configure cloud provider credentials from Models → Providers in your dashboard, or via the Provider Credentials API.

Self-Hosted Models

Host models on your own infrastructure using vLLM, Ollama, or TGI. Register them as custom models — they get the same tool-calling support as any BYOM model.

Recommended for Self-Hosting

Model	Parameters	License	vLLM Tool Parser
Gemma 4 31B	31B (Dense)	Apache 2.0	`hermes`
Gemma 4 26B MoE	3.8B active (MoE)	Apache 2.0	`hermes`
Llama 4 Maverick	17B active (128E MoE)	Llama Community	`llama4_pythonic`
Llama 4 Scout	17B active (16E MoE)	Llama Community	`llama4_pythonic`
Llama 3.3 70B	70B	Llama Community	`llama3_json`
Qwen 3.5	397B (MoE)	Apache 2.0	`hermes`
Mistral Large 3	41B active (MoE)	Apache 2.0	`mistral`
DeepSeek V3.2	37B active (MoE)	MIT	`deepseek`

vLLM Setup (Recommended)

vLLM supports OpenAI-compatible tool calling with constrained decoding for guaranteed valid JSON:

vllm serve meta-llama/Llama-4-Maverick-17B-128E-Instruct \
  --host 0.0.0.0 \
  --port 8000 \
  --tensor-parallel-size 4 \
  --enable-auto-tool-choice \
  --tool-call-parser llama4_pythonic

Then register:

curl -X POST https://api.airaproof.com/api/v1/models/custom \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Llama 4 Maverick (self-hosted)",
    "model_id": "llama-4-maverick",
    "endpoint_url": "https://your-gpu-server:8000/v1/chat/completions",
    "timeout_ms": 60000
  }'

Ollama Setup

For development and smaller deployments:

ollama pull llama3.3:70b
ollama serve

curl -X POST https://api.airaproof.com/api/v1/models/custom \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Llama 3.3 70B (Ollama)",
    "model_id": "llama-3.3-70b",
    "endpoint_url": "http://your-server:11434/v1/chat/completions",
    "timeout_ms": 60000
  }'

Always test your model endpoint after registering. Models that cannot return valid structured output will produce model_error results in cases.

Capabilities Matrix

Capability	Built-in	BYOM (Tool Calling)	BYOM (No Tools)	Cloud Provider
Ask Aira — full tool calling	Yes	Yes	Context-stuffing fallback	Yes
Consensus cases — structured output	Yes	Yes	Best-effort	Yes
Multi-step reasoning	Yes	Model-dependent	No	Yes
Guaranteed JSON schema	Yes	Provider-dependent	No	Yes
Streaming	Yes	No (planned)	No	Yes

Structured Output by Provider

Each provider handles structured output differently. Aira abstracts this so all models return the same decision schema.

Provider	Mechanism	Reliability
OpenAI	Native structured outputs (`response_format`)	Guaranteed valid JSON
Anthropic	Tool-use with forced tool call	Guaranteed valid JSON
Google	`response_mime_type` + schema	Guaranteed valid JSON
Self-hosted (vLLM)	Constrained decoding (`guided_json`) + tool calling	Guaranteed valid JSON
Self-hosted (Ollama)	JSON mode + prompt engineering	Best-effort (validated on receipt)
BYOM (with tool calling)	OpenAI function-calling format + response validation	High reliability
BYOM (no tool calling)	Prompt engineering + response validation	Best-effort (validated on receipt)

For best-effort providers, Aira validates the response and retries once if the output is malformed. If the retry also fails, the model returns a model_error result.

Model Selection Recommendations

High-Stakes Decisions (Finance, Healthcare, Legal)

Use 3-5 models from different providers for maximum independence:

{
  "models": ["claude-opus-4-6", "gpt-5.4", "gemini-3.1-pro"]
}

Add o3 for complex multi-step reasoning decisions.

Cost-Optimized Cases

Use faster, cheaper models for high-volume, lower-risk decisions:

{
  "models": ["claude-sonnet-4-6", "gpt-5.2", "claude-haiku-4-5"]
}

Maximum Provider Diversity

Mix commercial and open-source models:

{
  "models": ["gpt-5.4", "claude-sonnet-4-6", "custom:llama-4-maverick"]
}

Open-Source Only (Self-Hosted)

For organizations that require full data sovereignty:

{
  "models": ["custom:gemma-4-31b", "custom:llama-4-maverick", "custom:qwen-3.5"]
}

Cases require a minimum of 2 models. For production use, 3 models from at least 2 different providers is recommended to ensure meaningful consensus.

On this page