Aira

Custom Models

Register and manage your own model endpoints for use in consensus cases.

Overview

Custom models (BYOM — Bring Your Own Model) let you connect external model endpoints to Aira. Once registered, custom models can be used in cases alongside built-in models.

Custom model endpoints must be OpenAI-compatible by default — accepting chat completion requests and returning structured decisions.

Self-hosted open-weight models now have dedicated support with per-model system prompts and constrained decoding (vLLM guided_json). See Supported Models for the full list and setup instructions.


Register Model

POST /api/v1/models/custom
Authorization: Bearer aira_live_xxxxx

Request Body

FieldTypeRequiredDescription
namestringYesDisplay name (1-200 chars)
model_idstringYesUnique model identifier (1-100 chars)
endpoint_urlstringYesHTTPS endpoint URL
auth_headerstringNoAuthorization header value (e.g. Bearer sk-...)
request_schemaobjectNoRequest format (default: {"type": "openai_compatible"})
response_schemaobjectNoResponse parsing (default: {"type": "openai_compatible", "content_path": "choices[0].message.content"})
timeout_msintegerNoRequest timeout in ms (1,000-120,000, default: 30,000)

Example Request

curl -X POST https://api.airaproof.com/api/v1/models/custom \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Our Fine-tuned GPT",
    "model_id": "acme/compliance-v2",
    "endpoint_url": "https://api.acme.com/v1/chat/completions",
    "auth_header": "Bearer sk-acme-abc123...",
    "timeout_ms": 45000
  }'

Response (201 Created)

{
  "id": "cm_01J8X...",
  "name": "Our Fine-tuned GPT",
  "model_id": "acme/compliance-v2",
  "endpoint_url": "https://api.acme.com/v1/chat/completions",
  "has_auth": true,
  "request_schema": { "type": "openai_compatible" },
  "response_schema": { "type": "openai_compatible", "content_path": "choices[0].message.content" },
  "timeout_ms": 45000,
  "status": "untested",
  "last_tested": null,
  "created_at": "2026-03-14T10:30:00Z",
  "request_id": "req_01J8X..."
}

Response Fields

FieldTypeDescription
idstringUnique model registration ID
has_authbooleanWhether an auth header is configured (never exposes the value)
statusstringuntested, ok, or error
last_testedstring|nullISO 8601 timestamp of last test

List Models

GET /api/v1/models/custom
Authorization: Bearer aira_live_xxxxx

Returns all custom models registered by your organization.


Get Model

GET /api/v1/models/custom/{id}
Authorization: Bearer aira_live_xxxxx

Returns a single custom model by ID.


Update Model

PUT /api/v1/models/custom/{id}
Authorization: Bearer aira_live_xxxxx

Request Body

All fields are optional — only include fields you want to change.

FieldTypeDescription
namestringUpdated display name
endpoint_urlstringUpdated HTTPS endpoint
auth_headerstringUpdated auth header
request_schemaobjectUpdated request format
response_schemaobjectUpdated response parsing
timeout_msintegerUpdated timeout (1,000-120,000)

Example Request

curl -X PUT https://api.airaproof.com/api/v1/models/custom/cm_01J8X... \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "endpoint_url": "https://api.acme.com/v2/chat/completions",
    "timeout_ms": 60000
  }'

Delete Model

DELETE /api/v1/models/custom/{id}
Authorization: Bearer aira_live_xxxxx

Removes the custom model. Returns 204 No Content.


Test Model

POST /api/v1/models/custom/{id}/test
Authorization: Bearer aira_live_xxxxx

Sends a test request to the model endpoint and validates the response format.

Response (200 OK)

{
  "status": "ok",
  "latency_ms": 1250,
  "response_valid": true,
  "structured_output": {
    "decision": "APPROVE",
    "confidence": 0.85,
    "key_factors": ["test factor"],
    "reasoning": "Test reasoning response"
  },
  "error": null,
  "request_id": "req_01J8X..."
}

If the endpoint is unreachable or returns invalid output:

{
  "status": "error",
  "latency_ms": 30000,
  "response_valid": false,
  "structured_output": null,
  "error": "Request timed out after 30 seconds",
  "request_id": "req_..."
}

Test Response Fields

FieldTypeDescription
statusstringok if endpoint responded correctly, error otherwise
latency_msintegerRound-trip time in milliseconds
response_validbooleanWhether the response matches the expected schema
structured_outputobject|nullParsed decision output (if valid)
errorstring|nullError message (if failed)

Always test a model after registering or updating it. Models with error status can still be used in cases, but may cause individual model failures.


Supported Self-Hosted Models

The following open-weight models have been tested and have dedicated per-model system prompts optimized for Aira's structured output format:

ModelParametersLicense
Llama 3.3 70B70BLlama Community License
Llama 4 Scout17B active (MoE)Llama Community License
Llama 4 Maverick17B active (MoE)Llama Community License
Mistral Large 341B active (MoE)Apache 2.0
Qwen 3.5397B (MoE)Apache 2.0
DeepSeek V3.237B active (MoE)MIT

Other OpenAI-compatible models can still be registered as custom models but will use generic prompting.


vLLM Setup Example

vLLM is the recommended serving engine. It supports guided_json constrained decoding, which guarantees valid structured output from any model.

# Serve a model with vLLM
vllm serve meta-llama/Llama-4-Scout-17B-16E-Instruct \
  --host 0.0.0.0 \
  --port 8000 \
  --tensor-parallel-size 4 \
  --enable-auto-tool-choice

Then register it:

curl -X POST https://api.airaproof.com/api/v1/models/custom \
  -H "Authorization: Bearer aira_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Llama 4 Scout (vLLM)",
    "model_id": "llama-4-scout",
    "endpoint_url": "https://your-gpu-server:8000/v1/chat/completions",
    "timeout_ms": 60000
  }'

When using constrained decoding, ensure your vLLM version supports guided_json (v0.6.0+). Older versions may fall back to unconstrained generation.

On this page