Aira

Supported Models

All models available in Aira — built-in, BYOM with tool calling, cloud provider, and self-hosted — with capabilities matrix and recommendations.

Overview

Aira supports four categories of models:

  1. Built-in models — managed by Aira, ready to use immediately
  2. BYOM (Bring Your Own Model) — any OpenAI-compatible endpoint with tool calling support
  3. Cloud provider models — accessed through AWS Bedrock, Azure OpenAI, or Google Vertex AI
  4. Self-hosted models — your own deployments via vLLM, Ollama, or TGI

All models get the same tool-calling experience in Ask Aira chat and return structured output (decision, confidence, key factors, reasoning) in consensus cases.


Built-in Models

These models are available out of the box. Use Provider Credentials to bring your own API keys, or let Aira manage keys for you.

Anthropic

Model IDDisplay NameTool CallingBest For
claude-opus-4-6Claude Opus 4.6NativeHighest capability — 1M token context, deep legal analysis
claude-sonnet-4-6Claude Sonnet 4.6NativeBest value — strong reasoning at lower cost
claude-haiku-4-5Claude Haiku 4.5NativeFast and cost-effective — high-volume cases

OpenAI

Model IDDisplay NameTool CallingBest For
gpt-5.4GPT-5.4NativeFlagship — highest accuracy, native structured outputs
gpt-5.2GPT-5.2NativeBattle-tested — proven in production workloads
o3o3 ReasoningNativeComplex multi-step reasoning and analysis

Google

Model IDDisplay NameTool CallingBest For
gemini-3.1-proGemini 3.1 ProNativeStrong analytical capabilities, large context
gemma-4-31bGemma 4 31BNativeOpen-weight (Apache 2.0), frontier intelligence per parameter
gemma-4-26b-moeGemma 4 26B MoENativeOpen-weight (Apache 2.0), fast inference (3.8B active params)

BYOM — Bring Your Own Model

Register any model accessible via an OpenAI-compatible /v1/chat/completions endpoint. This includes hosted API providers, self-hosted models via vLLM/Ollama/TGI, or any custom endpoint.

Aira uses the standard OpenAI function-calling format for tool use. Models that support it get the same full tool-calling experience as built-in models — including multi-step reasoning with all Ask Aira tools.

Verified Models

These open-weight models have been verified to work with Aira's tool calling and structured output:

ModelTool CallingStructured OutputNotes
Gemma 4 31BFull supportExcellentApache 2.0, native function calling, 256K context
Gemma 4 26B MoEFull supportExcellentApache 2.0, 3.8B active params, fastest in class
Qwen 3.5Full supportExcellentTop-tier agent benchmarks, Apache 2.0
Llama 4 MaverickFull supportStrongMoE architecture, strong reasoning
Llama 3.3 70BFull supportStrongWidely available, very reliable
Mistral LargeFull supportStrongStrong multilingual support
DeepSeek V3.2Full supportStrongCost-efficient, MIT license

Any OpenAI-compatible endpoint serving these models will work — whether self-hosted or via a hosted provider.

How to Register

curl -X POST https://api.airaproof.com/api/v1/models/custom \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Llama 3.3 70B",
    "model_id": "llama-3.3-70b",
    "endpoint_url": "https://your-endpoint/v1/chat/completions",
    "auth_header": "Bearer your-api-key",
    "timeout_ms": 30000
  }'

Or register from the dashboard: Models → Register Model → Custom Endpoint.

Models without tool calling support fall back to context-stuffing mode — Aira pre-fetches all available data and includes it in the prompt. This works but produces less precise answers than full tool calling.


Cloud Provider Models

Access models through your existing cloud provider accounts. Useful for data residency compliance, enterprise agreements, or accessing models not available as built-in.

AWS Bedrock

{
  "provider": "bedrock",
  "credentials": {
    "type": "aws",
    "access_key_id": "AKIA...",
    "secret_access_key": "...",
    "region": "us-east-1"
  }
}

Available: Claude (Opus, Sonnet, Haiku), Llama 3.3/4, Mistral Large.

Azure OpenAI

{
  "provider": "azure",
  "credentials": {
    "type": "azure",
    "endpoint": "https://your-resource.openai.azure.com",
    "api_key": "...",
    "api_version": "2024-10-21"
  }
}

Google Vertex AI

{
  "provider": "vertex",
  "credentials": {
    "type": "vertex",
    "project_id": "your-gcp-project",
    "region": "us-central1"
  }
}

Configure cloud provider credentials from Models → Providers in your dashboard, or via the Provider Credentials API.


Self-Hosted Models

Host models on your own infrastructure using vLLM, Ollama, or TGI. Register them as custom models — they get the same tool-calling support as any BYOM model.

ModelParametersLicensevLLM Tool Parser
Gemma 4 31B31B (Dense)Apache 2.0hermes
Gemma 4 26B MoE3.8B active (MoE)Apache 2.0hermes
Llama 4 Maverick17B active (128E MoE)Llama Communityllama4_pythonic
Llama 4 Scout17B active (16E MoE)Llama Communityllama4_pythonic
Llama 3.3 70B70BLlama Communityllama3_json
Qwen 3.5397B (MoE)Apache 2.0hermes
Mistral Large 341B active (MoE)Apache 2.0mistral
DeepSeek V3.237B active (MoE)MITdeepseek

vLLM supports OpenAI-compatible tool calling with constrained decoding for guaranteed valid JSON:

vllm serve meta-llama/Llama-4-Maverick-17B-128E-Instruct \
  --host 0.0.0.0 \
  --port 8000 \
  --tensor-parallel-size 4 \
  --enable-auto-tool-choice \
  --tool-call-parser llama4_pythonic

Then register:

curl -X POST https://api.airaproof.com/api/v1/models/custom \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Llama 4 Maverick (self-hosted)",
    "model_id": "llama-4-maverick",
    "endpoint_url": "https://your-gpu-server:8000/v1/chat/completions",
    "timeout_ms": 60000
  }'

Ollama Setup

For development and smaller deployments:

ollama pull llama3.3:70b
ollama serve

Register with the Ollama endpoint:

curl -X POST https://api.airaproof.com/api/v1/models/custom \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Llama 3.3 70B (Ollama)",
    "model_id": "llama-3.3-70b",
    "endpoint_url": "http://your-server:11434/v1/chat/completions",
    "timeout_ms": 60000
  }'

Always test your model endpoint after registering. Models that cannot return valid structured output will produce model_error results in cases.


Capabilities Matrix

CapabilityBuilt-inBYOM (Tool Calling)BYOM (No Tools)Cloud Provider
Ask Aira — full tool callingYesYesContext-stuffing fallbackYes
Consensus cases — structured outputYesYesBest-effortYes
Multi-step reasoningYesModel-dependentNoYes
Guaranteed JSON schemaYesProvider-dependentNoYes
StreamingYesNo (planned)NoYes

Structured Output by Provider

Each provider handles structured output differently. Aira abstracts this so all models return the same decision schema.

ProviderMechanismReliability
OpenAINative structured outputs (response_format)Guaranteed valid JSON
AnthropicTool-use with forced tool callGuaranteed valid JSON
Googleresponse_mime_type + schemaGuaranteed valid JSON
Self-hosted (vLLM)Constrained decoding (guided_json) + tool callingGuaranteed valid JSON
Self-hosted (Ollama)JSON mode + prompt engineeringBest-effort (validated on receipt)
BYOM (with tool calling)OpenAI function-calling format + response validationHigh reliability
BYOM (no tool calling)Prompt engineering + response validationBest-effort (validated on receipt)

For best-effort providers, Aira validates the response and retries once if the output is malformed. If the retry also fails, the model returns a model_error result.


Model Selection Recommendations

Use 3-5 models from different providers for maximum independence:

{
  "models": ["claude-opus-4-6", "gpt-5.4", "gemini-3.1-pro"]
}

Add o3 for complex multi-step reasoning decisions.

Cost-Optimized Cases

Use faster, cheaper models for high-volume, lower-risk decisions:

{
  "models": ["claude-sonnet-4-6", "gpt-5.2", "claude-haiku-4-5"]
}

Maximum Provider Diversity

Mix commercial and open-source models:

{
  "models": ["gpt-5.4", "claude-sonnet-4-6", "custom:llama-4-maverick"]
}

Open-Source Only (Self-Hosted)

For organizations that require full data sovereignty:

{
  "models": ["custom:gemma-4-31b", "custom:llama-4-maverick", "custom:qwen-3.5"]
}

Cases require a minimum of 2 models. For production use, 3 models from at least 2 different providers is recommended to ensure meaningful consensus.

On this page