Aira

Supported Models

All models available in Aira — built-in, BYOM with tool calling, cloud provider, and self-hosted — with capabilities matrix and recommendations.

Overview

Aira supports four categories of models:

  1. Built-in models — managed by Aira, ready to use immediately
  2. BYOM (Bring Your Own Model) — any OpenAI-compatible endpoint with tool calling support
  3. Cloud provider models — accessed through AWS Bedrock, Azure OpenAI, or Google Vertex AI
  4. Self-hosted models — your own deployments via vLLM, Ollama, or TGI

All models get the same tool-calling experience in Ask Aira chat and return structured output (decision, confidence, key factors, reasoning) in consensus cases.


Built-in Models

These models are available out of the box. Use Provider Credentials to bring your own API keys, or let Aira manage keys for you.

Free tier model restrictions: Claude Opus (4.8, 4.7, 4.6), GPT-5.5, and o3 are available on Pro plans and above. Free tier has access to all other models including Sonnet, Haiku, GPT-5.4, Gemini, DeepSeek, Grok, and more. See Billing for details.

Anthropic

Model IDDisplay NameBest For
claude-fable-5Claude Fable 5Most capable model ever released — state-of-the-art on nearly all benchmarks, exceptional autonomous agents
claude-opus-4-8Claude Opus 4.8Strong flagship — sharp judgement, long autonomous runs
claude-opus-4-7Claude Opus 4.7Proven flagship — agentic coding
claude-opus-4-6Claude Opus 4.6Previous flagship — proven in production
claude-sonnet-4-6Claude Sonnet 4.6Best value — strong reasoning at lower cost
claude-haiku-4-5Claude Haiku 4.5Fastest and cheapest — high-volume cases

OpenAI

Model IDDisplay NameBest For
gpt-5.5GPT-5.5Latest flagship — 1M context, agentic multi-step
gpt-5.4GPT-5.4Strong all-around — native structured outputs
gpt-5.2GPT-5.2Battle-tested — proven in production workloads
gpt-5-miniGPT-5 MiniCost-effective for simpler tasks
o3OpenAI o3Complex multi-step reasoning and analysis

Google

Model IDDisplay NameBest For
gemini-3.5-flashGemini 3.5 FlashFastest frontier model — 4x speed, GA with SLA
gemini-3.1-proGemini 3.1 ProStrong analytical capabilities, large context
gemini-3.1-flash-liteGemini 3.1 Flash LiteLightweight, cost-efficient
gemma-4-31bGemma 4 31BOpen-weight (Apache 2.0), frontier per parameter
gemma-4-26b-moeGemma 4 26B MoEOpen-weight (Apache 2.0), fast inference

DeepSeek

Model IDDisplay NameBest For
deepseek-v4-proDeepSeek V4 Pro1.6T MoE, MIT license, lowest cost frontier model

xAI

Model IDDisplay NameBest For
grok-4.3Grok 4.31M context, native video input, aggressive pricing

Mistral

Model IDDisplay NameBest For
devstral-2Devstral 2123B coding specialist, open source (MIT)

Moonshot (Kimi)

Model IDDisplay NameBest For
kimi-k2.6Kimi K2.61T MoE, 262K context, open weights, agent swarm

Alibaba (Qwen)

Model IDDisplay NameBest For
qwen3.7-maxQwen 3.7 MaxAgent-first, 1M context, half the cost of Opus

BYOM — Bring Your Own Model

Register any model accessible via an OpenAI-compatible /v1/chat/completions endpoint. This includes hosted API providers, self-hosted models via vLLM/Ollama/TGI, or any custom endpoint.

Aira uses the standard OpenAI function-calling format for tool use. Models that support it get the same full tool-calling experience as built-in models — including multi-step reasoning with all Ask Aira tools.

Verified Models

These open-weight models have been verified to work with Aira's tool calling and structured output:

ModelTool CallingStructured OutputNotes
Gemma 4 31BFull supportExcellentApache 2.0, native function calling, 256K context
Gemma 4 26B MoEFull supportExcellentApache 2.0, 3.8B active params, fastest in class
Qwen 3.7 MaxFull supportExcellentAgent-first, 1M context, OpenAI-compatible
Qwen 3.5Full supportExcellentTop-tier agent benchmarks, Apache 2.0
Kimi K2.6Full supportExcellent1T MoE, 262K context, open weights
DeepSeek V4 ProFull supportExcellent1.6T MoE, MIT license, lowest cost
DeepSeek V3.2Full supportStrongCost-efficient, MIT license
Llama 4 MaverickFull supportStrongMoE architecture, strong reasoning
Llama 3.3 70BFull supportStrongWidely available, very reliable
Devstral 2Full supportStrong123B coding specialist, open source
Mistral LargeFull supportStrongStrong multilingual support

Any OpenAI-compatible endpoint serving these models will work — whether self-hosted or via a hosted provider.

How to Register

curl -X POST https://api.airaproof.com/api/v1/models/custom \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Llama 3.3 70B",
    "model_id": "llama-3.3-70b",
    "endpoint_url": "https://your-endpoint/v1/chat/completions",
    "auth_header": "Bearer your-api-key",
    "timeout_ms": 30000
  }'

Or register from the dashboard: Models → Register Model → Custom Endpoint.

Models without tool calling support fall back to context-stuffing mode — Aira pre-fetches all available data and includes it in the prompt. This works but produces less precise answers than full tool calling.

Will my custom model work?

Yes, if it meets one requirement: your endpoint accepts OpenAI-compatible POST /v1/chat/completions requests with a messages array and returns a JSON response with the content at choices[0].message.content.

This covers:

  • vLLM — native OpenAI compatibility
  • Ollama — OpenAI compatibility at /v1/chat/completions
  • TGI — OpenAI-compatible mode
  • Any hosted provider — Together AI, Fireworks, Groq, Replicate, etc.
  • Your own fine-tuned model — as long as the serving layer is OpenAI-compatible

Aira sends a system prompt asking the model to return a JSON decision with decision, confidence, key_factors, and reasoning fields. Models that follow instructions well (7B+) handle this reliably. For smaller models or models that struggle with structured output, Aira validates the response and retries once.

What if my endpoint has a different format? Use the response_schema.content_path field to tell Aira where to find the response text. For example, if your endpoint returns {"result": {"text": "..."}}, set content_path to "result.text".

What if I'm fine-tuning my own model? Aira doesn't require special training. Any instruction-following model works. For best results, ensure your model can output valid JSON when asked.


Cloud Provider Models

Access models through your existing cloud provider accounts. Useful for data residency compliance, enterprise agreements, or accessing models not available as built-in.

AWS Bedrock

{
  "provider": "bedrock",
  "credentials": {
    "type": "aws",
    "access_key_id": "AKIA...",
    "secret_access_key": "...",
    "region": "us-east-1"
  }
}

Available: Claude (Opus 4.8, 4.7, 4.6, Sonnet 4.6), Llama 3.3/4, Mistral Large.

Azure OpenAI

{
  "provider": "azure",
  "credentials": {
    "type": "azure",
    "endpoint": "https://your-resource.openai.azure.com",
    "api_key": "...",
    "api_version": "2024-10-21"
  }
}

Google Vertex AI

{
  "provider": "vertex",
  "credentials": {
    "type": "vertex",
    "project_id": "your-gcp-project",
    "region": "us-central1"
  }
}

Configure cloud provider credentials from Models → Providers in your dashboard, or via the Provider Credentials API.


Self-Hosted Models

Host models on your own infrastructure using vLLM, Ollama, or TGI. Register them as custom models — they get the same tool-calling support as any BYOM model.

Frontier (multi-GPU)

ModelActive ParamsTotalLicensevLLM Tool Parser
DeepSeek V4 Pro49B1.6T MoEMITdeepseek
Kimi K2.632B1T MoEMIThermes
Command A+25B218B MoEApache 2.0hermes
Mistral Medium 3.5128B128B DenseApache 2.0mistral
Mistral Large 341B675B MoEApache 2.0mistral
Llama 4 Maverick17B400B MoELlama Communityllama4_pythonic
Qwen 3.517B397B MoEApache 2.0hermes

General purpose (single A100/H100)

ModelParametersLicensevLLM Tool Parser
Gemma 4 31B31B DenseApache 2.0hermes
Qwen 3.6 27B27B DenseApache 2.0hermes
Gemma 4 26B MoE3.8B activeApache 2.0hermes
Llama 4 Scout17B activeLlama Communityllama4_pythonic
Llama 3.3 70B70BLlama Communityllama3_json
DeepSeek V4 Flash13B activeMITdeepseek
Mistral Small 424BApache 2.0mistral

Reasoning

ModelParametersLicenseNotes
DeepSeek R137B active (671B)MITFull reasoning model
DeepSeek R1 32B32B (distilled)MITFits single GPU
QwQ 32B32BApache 2.0Chain-of-thought, competitive with o1-mini
Phi-4 Reasoning14BMITBest reasoning under 15B

Code

ModelParametersLicenseNotes
Qwen3 Coder Next3B active (80B)Apache 2.0Top agentic coding benchmark
Devstral 224BApache 2.072% SWE-Bench Verified
Qwen2.5 Coder 32B32BApache 2.0Rivals GPT-4o on code

Efficient (single consumer GPU)

ModelParametersLicenseNotes
Phi-414BMITBest reasoning at size
Gemma 4 12B12BApache 2.0Multimodal, laptop-friendly
Qwen3 8B8BApache 2.0Strong multilingual
Llama 3.1 8B8BLlama CommunityWorkhorse baseline
Gemma 4 E4B4BApache 2.0Edge deployment

vLLM supports OpenAI-compatible tool calling with constrained decoding for guaranteed valid JSON:

vllm serve meta-llama/Llama-4-Maverick-17B-128E-Instruct \
  --host 0.0.0.0 \
  --port 8000 \
  --tensor-parallel-size 4 \
  --enable-auto-tool-choice \
  --tool-call-parser llama4_pythonic

Then register:

curl -X POST https://api.airaproof.com/api/v1/models/custom \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Llama 4 Maverick (self-hosted)",
    "model_id": "llama-4-maverick",
    "endpoint_url": "https://your-gpu-server:8000/v1/chat/completions",
    "timeout_ms": 60000
  }'

Ollama Setup

For development and smaller deployments:

ollama pull llama3.3:70b
ollama serve

Register with the Ollama endpoint:

curl -X POST https://api.airaproof.com/api/v1/models/custom \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Llama 3.3 70B (Ollama)",
    "model_id": "llama-3.3-70b",
    "endpoint_url": "http://your-server:11434/v1/chat/completions",
    "timeout_ms": 60000
  }'

Always test your model endpoint after registering. Models that cannot return valid structured output will produce model_error results in cases.


Capabilities Matrix

CapabilityBuilt-inBYOM (Tool Calling)BYOM (No Tools)Cloud Provider
Ask Aira — full tool callingYesYesContext-stuffing fallbackYes
Consensus cases — structured outputYesYesBest-effortYes
Multi-step reasoningYesModel-dependentNoYes
Guaranteed JSON schemaYesProvider-dependentNoYes
StreamingYesNo (planned)NoYes

Structured Output by Provider

Each provider handles structured output differently. Aira abstracts this so all models return the same decision schema.

ProviderMechanismReliability
OpenAINative structured outputs (response_format)Guaranteed valid JSON
AnthropicTool-use with forced tool callGuaranteed valid JSON
Googleresponse_mime_type + schemaGuaranteed valid JSON
Self-hosted (vLLM)Constrained decoding (guided_json) + tool callingGuaranteed valid JSON
Self-hosted (Ollama)JSON mode + prompt engineeringBest-effort (validated on receipt)
BYOM (with tool calling)OpenAI function-calling format + response validationHigh reliability
BYOM (no tool calling)Prompt engineering + response validationBest-effort (validated on receipt)

For best-effort providers, Aira validates the response and retries once if the output is malformed. If the retry also fails, the model returns a model_error result.


Model Selection Recommendations

Use 3+ models from different providers for maximum independence:

{
  "models": ["claude-opus-4-8", "gpt-5.5", "gemini-3.5-flash"]
}

PR Code Review (Consensus)

Two frontier models for high-confidence findings:

{
  "models": ["claude-sonnet-4-6", "claude-opus-4-8"]
}

Or cross-provider consensus:

{
  "models": ["claude-sonnet-4-6", "gpt-5.5", "deepseek-v4-pro"]
}

Cost-Optimized

Cheaper models for high-volume, lower-risk decisions:

{
  "models": ["claude-haiku-4-5", "gpt-5-mini", "deepseek-v4-pro"]
}

Maximum Provider Diversity

Mix commercial and open-source:

{
  "models": ["gpt-5.5", "claude-sonnet-4-6", "deepseek-v4-pro"]
}

Open-Source Only (Self-Hosted)

Full data sovereignty:

{
  "models": ["custom:gemma-4-31b", "custom:llama-4-maverick", "deepseek-v4-pro"]
}

Cases and consensus policies accept 2-3 models. We recommend 3 models from at least 2 different providers for meaningful consensus.

On this page