Aira

Supported Models

All models available in Aira — built-in, cloud provider, and self-hosted — with setup instructions and recommendations.

Overview

Aira supports three categories of models for consensus cases:

  1. Built-in models — managed by Aira, ready to use immediately
  2. Cloud provider models — accessed through AWS Bedrock, Azure OpenAI, or Google Vertex AI
  3. Self-hosted models — your own deployments via vLLM, Ollama, or TGI

All models return structured output (decision, confidence, key factors, reasoning) regardless of provider.


Built-in Models

These models are available out of the box. Use BYOK to bring your own API keys, or let Aira manage keys for you.

OpenAI

Model IDDisplay NameBest For
gpt-5.4GPT-5.4Flagship model — highest accuracy, native structured outputs
gpt-5.2GPT-5.2Proven stable — battle-tested in production workloads
gpt-5-miniGPT-5 MiniCost-optimized — fast responses at lower cost
o3o3 ReasoningReasoning-focused — excels at complex, multi-step decisions

Anthropic

Model IDDisplay NameBest For
claude-opus-4-6Claude Opus 4.6Highest capability — 1M token context, deep analysis
claude-sonnet-4-6Claude Sonnet 4.6Best value — 1M token context, strong reasoning at lower cost
claude-haiku-4-5Claude Haiku 4.5Fast and cost-effective — ideal for high-volume cases

Google

Model IDDisplay NameBest For
gemini-3.1-proGemini 3.1 ProLatest Pro model — strong analytical capabilities
gemini-3.1-flash-liteGemini 3.1 Flash LiteFast and cost-effective — good for latency-sensitive cases

Cloud Provider Models

Access models through your existing cloud provider accounts. This is useful for compliance requirements (data residency), cost management (existing enterprise agreements), or accessing models not available as built-in.

AWS Bedrock

Configure Bedrock access to use Claude, Llama, and Mistral models through your AWS account.

{
  "provider": "bedrock",
  "config": {
    "aws_access_key_id": "AKIA...",
    "aws_secret_access_key": "...",
    "aws_region": "us-east-1"
  }
}

Available models through Bedrock include Claude (Opus, Sonnet, Haiku), Llama 3.3/4, and Mistral Large.

Azure OpenAI

Use GPT-5.x models through your Azure OpenAI deployment.

{
  "provider": "azure",
  "config": {
    "azure_endpoint": "https://your-resource.openai.azure.com",
    "azure_deployment": "gpt-5-4",
    "api_version": "2026-01-01-preview",
    "api_key": "..."
  }
}

Google Vertex AI

Access Claude and Gemini models through Google Cloud.

{
  "provider": "vertex",
  "config": {
    "project_id": "your-gcp-project",
    "region": "us-central1"
  }
}

Cloud provider configuration is set per API key using the BYOK endpoint. Each API key can have different provider configurations.


Self-Hosted Models

Register self-hosted models served via vLLM, Ollama, or TGI using the Custom Models API. Self-hosted models have dedicated support with per-model system prompts and constrained decoding for guaranteed schema compliance.

Supported Self-Hosted Models

ModelParametersLicenseNotes
Llama 3.3 70B70BLlama Community LicenseStrong general-purpose reasoning
Llama 4 Scout17B active (MoE)Llama Community LicenseEfficient MoE architecture
Llama 4 Maverick17B active (MoE)Llama Community LicenseHigher quality MoE variant
Mistral Large 341B active (MoE)Apache 2.0Strong multilingual support
Qwen 3.5397B (MoE)Apache 2.0Largest open-weight MoE
DeepSeek V3.237B active (MoE)MITCost-efficient reasoning

vLLM Setup

vLLM is the recommended serving engine for self-hosted models. It supports guided_json for constrained decoding, which guarantees valid structured output.

# Start vLLM with guided decoding support
vllm serve meta-llama/Llama-4-Scout-17B-16E-Instruct \
  --host 0.0.0.0 \
  --port 8000 \
  --tensor-parallel-size 4 \
  --enable-auto-tool-choice

Then register the model in Aira:

curl -X POST https://api.airaproof.com/api/v1/models/custom \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Llama 4 Scout (self-hosted)",
    "model_id": "llama-4-scout",
    "endpoint_url": "https://your-gpu-server:8000/v1/chat/completions",
    "timeout_ms": 60000
  }'

Ollama Setup

For smaller deployments or development environments:

# Pull and run a model
ollama pull llama3.3:70b
ollama serve

Register with the Ollama-compatible endpoint:

curl -X POST https://api.airaproof.com/api/v1/models/custom \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Llama 3.3 70B (Ollama)",
    "model_id": "llama-3.3-70b-ollama",
    "endpoint_url": "http://your-server:11434/v1/chat/completions",
    "timeout_ms": 60000
  }'

Always test your model endpoint after registering it. Models that cannot return valid structured output will produce model_error results in cases.


Structured Output by Provider

Each provider handles structured output differently. Aira abstracts this so all models return the same decision schema.

ProviderMechanismReliability
OpenAINative structured outputs (response_format)Guaranteed valid JSON
AnthropicTool-use with forced tool callGuaranteed valid JSON
Googleresponse_mime_type + schemaGuaranteed valid JSON
Self-hosted (vLLM)Constrained decoding (guided_json)Guaranteed valid JSON
Self-hosted (Ollama)JSON mode + prompt engineeringBest-effort (validated on receipt)
Custom endpointsPrompt engineering + response validationBest-effort (validated on receipt)

For best-effort providers, Aira validates the response and retries once if the output is malformed. If the retry also fails, the model returns a model_error result.


Model Selection Recommendations

Use 3-5 models from different providers for maximum independence:

{
  "models": ["gpt-5.4", "claude-opus-4-6", "gemini-3.1-pro"]
}

Consider adding o3 when the decision involves complex multi-step reasoning.

Cost-Optimized Cases

Use faster, cheaper models for high-volume, lower-risk decisions:

{
  "models": ["gpt-5-mini", "claude-haiku-4-5", "gemini-3.1-flash-lite"]
}

Maximum Coverage

Mix commercial and self-hosted models for provider diversity:

{
  "models": ["gpt-5.4", "claude-sonnet-4-6", "custom:llama-4-scout"]
}

Reasoning-Heavy Queries

Include reasoning-focused models for complex analytical decisions:

{
  "models": ["o3", "claude-opus-4-6", "gpt-5.4"]
}

Cases require a minimum of 2 models. For production use, 3 models from at least 2 different providers is recommended to ensure meaningful consensus.

On this page