Aira

Custom Models

Register and manage your own model endpoints for use in consensus cases.

Overview

For AWS Bedrock, Azure OpenAI, and Google Vertex AI, use Provider Credentials instead.

Custom models (BYOM — Bring Your Own Model) let you connect any OpenAI-compatible endpoint to Aira. Once registered, custom models can be used in cases and Ask Aira chat alongside built-in models.

Endpoints must accept /v1/chat/completions requests. Models that support OpenAI function calling get full tool-calling support in Ask Aira chat — the same experience as built-in models. Models without tool calling fall back to context-stuffing mode.

See Supported Models → BYOM for verified models and setup instructions.

Self-hosted open-weight models now have dedicated support with per-model system prompts and constrained decoding (vLLM guided_json). See Supported Models for the full list and setup instructions.


Register Model

POST /api/v1/models/custom
Authorization: Bearer aira_live_xxxxx

Request Body

FieldTypeRequiredDescription
namestringYesDisplay name (1-200 chars)
model_idstringYesUnique model identifier (1-100 chars)
endpoint_urlstringYesHTTPS endpoint URL
auth_headerstringNoAuthorization header value (e.g. Bearer sk-...)
request_schemaobjectNoRequest format (default: {"type": "openai_compatible"})
response_schemaobjectNoResponse parsing (default: {"type": "openai_compatible", "content_path": "choices[0].message.content"})
timeout_msintegerNoRequest timeout in ms (1,000-120,000, default: 30,000)

Example Request

curl -X POST https://api.airaproof.com/api/v1/models/custom \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Our Fine-tuned GPT",
    "model_id": "acme/compliance-v2",
    "endpoint_url": "https://api.acme.com/v1/chat/completions",
    "auth_header": "Bearer sk-acme-abc123...",
    "timeout_ms": 45000
  }'

Response (201 Created)

{
  "id": "cm_01J8X...",
  "name": "Our Fine-tuned GPT",
  "model_id": "acme/compliance-v2",
  "endpoint_url": "https://api.acme.com/v1/chat/completions",
  "has_auth": true,
  "request_schema": { "type": "openai_compatible" },
  "response_schema": { "type": "openai_compatible", "content_path": "choices[0].message.content" },
  "timeout_ms": 45000,
  "status": "untested",
  "last_tested": null,
  "created_at": "2026-03-14T10:30:00Z",
  "request_id": "req_01J8X..."
}

Response Fields

FieldTypeDescription
idstringUnique model registration ID
has_authbooleanWhether an auth header is configured (never exposes the value)
statusstringuntested, ok, or error
last_testedstring|nullISO 8601 timestamp of last test

List Models

GET /api/v1/models/custom
Authorization: Bearer aira_live_xxxxx

Returns all custom models registered by your organization.


Get Model

GET /api/v1/models/custom/{id}
Authorization: Bearer aira_live_xxxxx

Returns a single custom model by ID.


Update Model

PUT /api/v1/models/custom/{id}
Authorization: Bearer aira_live_xxxxx

Request Body

All fields are optional — only include fields you want to change.

FieldTypeDescription
namestringUpdated display name
endpoint_urlstringUpdated HTTPS endpoint
auth_headerstringUpdated auth header
request_schemaobjectUpdated request format
response_schemaobjectUpdated response parsing
timeout_msintegerUpdated timeout (1,000-120,000)

Example Request

curl -X PUT https://api.airaproof.com/api/v1/models/custom/cm_01J8X... \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "endpoint_url": "https://api.acme.com/v2/chat/completions",
    "timeout_ms": 60000
  }'

Delete Model

DELETE /api/v1/models/custom/{id}
Authorization: Bearer aira_live_xxxxx

Removes the custom model. Returns 204 No Content.


Test Model

POST /api/v1/models/custom/{id}/test
Authorization: Bearer aira_live_xxxxx

Sends a test request to the model endpoint and validates the response format.

Response (200 OK)

{
  "status": "ok",
  "latency_ms": 1250,
  "response_valid": true,
  "structured_output": {
    "decision": "APPROVE",
    "confidence": 0.85,
    "key_factors": ["test factor"],
    "reasoning": "Test reasoning response"
  },
  "error": null,
  "request_id": "req_01J8X..."
}

If the endpoint is unreachable or returns invalid output:

{
  "status": "error",
  "latency_ms": 30000,
  "response_valid": false,
  "structured_output": null,
  "error": "Request timed out after 30 seconds",
  "request_id": "req_..."
}

Test Response Fields

FieldTypeDescription
statusstringok if endpoint responded correctly, error otherwise
latency_msintegerRound-trip time in milliseconds
response_validbooleanWhether the response matches the expected schema
structured_outputobject|nullParsed decision output (if valid)
errorstring|nullError message (if failed)

Always test a model after registering or updating it. Models with error status can still be used in cases, but may cause individual model failures.


Supported Self-Hosted Models

The following open-weight models have been tested and have dedicated per-model system prompts optimized for Aira's structured output format:

ModelParametersLicense
Llama 3.3 70B70BLlama Community License
Llama 4 Scout17B active (MoE)Llama Community License
Llama 4 Maverick17B active (MoE)Llama Community License
Mistral Large 341B active (MoE)Apache 2.0
Qwen 3.5397B (MoE)Apache 2.0
DeepSeek V3.237B active (MoE)MIT

Other OpenAI-compatible models can still be registered as custom models but will use generic prompting.


vLLM Setup Example

vLLM is the recommended serving engine. It supports guided_json constrained decoding, which guarantees valid structured output from any model.

# Serve a model with vLLM
vllm serve meta-llama/Llama-4-Scout-17B-16E-Instruct \
  --host 0.0.0.0 \
  --port 8000 \
  --tensor-parallel-size 4 \
  --enable-auto-tool-choice

Then register it:

curl -X POST https://api.airaproof.com/api/v1/models/custom \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Llama 4 Scout (vLLM)",
    "model_id": "llama-4-scout",
    "endpoint_url": "https://your-gpu-server:8000/v1/chat/completions",
    "timeout_ms": 60000
  }'

When using constrained decoding, ensure your vLLM version supports guided_json (v0.6.0+). Older versions may fall back to unconstrained generation.

On this page