Custom Models
Register and manage your own model endpoints for use in consensus cases.
Overview
For AWS Bedrock, Azure OpenAI, and Google Vertex AI, use Provider Credentials instead.
Custom models (BYOM — Bring Your Own Model) let you connect any OpenAI-compatible endpoint to Aira. Once registered, custom models can be used in cases and Ask Aira chat alongside built-in models.
Endpoints must accept /v1/chat/completions requests. Models that support OpenAI function calling get full tool-calling support in Ask Aira chat — the same experience as built-in models. Models without tool calling fall back to context-stuffing mode.
See Supported Models → BYOM for verified models and setup instructions.
Self-hosted open-weight models now have dedicated support with per-model system prompts and constrained decoding (vLLM guided_json). See Supported Models for the full list and setup instructions.
Register Model
POST /api/v1/models/custom
Authorization: Bearer aira_live_xxxxxRequest Body
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Display name (1-200 chars) |
model_id | string | Yes | Unique model identifier (1-100 chars) |
endpoint_url | string | Yes | HTTPS endpoint URL |
auth_header | string | No | Authorization header value (e.g. Bearer sk-...) |
request_schema | object | No | Request format (default: {"type": "openai_compatible"}) |
response_schema | object | No | Response parsing (default: {"type": "openai_compatible", "content_path": "choices[0].message.content"}) |
timeout_ms | integer | No | Request timeout in ms (1,000-120,000, default: 30,000) |
Example Request
curl -X POST https://api.airaproof.com/api/v1/models/custom \
-H "Authorization: Bearer aira_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"name": "Our Fine-tuned GPT",
"model_id": "acme/compliance-v2",
"endpoint_url": "https://api.acme.com/v1/chat/completions",
"auth_header": "Bearer sk-acme-abc123...",
"timeout_ms": 45000
}'Response (201 Created)
{
"id": "cm_01J8X...",
"name": "Our Fine-tuned GPT",
"model_id": "acme/compliance-v2",
"endpoint_url": "https://api.acme.com/v1/chat/completions",
"has_auth": true,
"request_schema": { "type": "openai_compatible" },
"response_schema": { "type": "openai_compatible", "content_path": "choices[0].message.content" },
"timeout_ms": 45000,
"status": "untested",
"last_tested": null,
"created_at": "2026-03-14T10:30:00Z",
"request_id": "req_01J8X..."
}Response Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique model registration ID |
has_auth | boolean | Whether an auth header is configured (never exposes the value) |
status | string | untested, ok, or error |
last_tested | string|null | ISO 8601 timestamp of last test |
List Models
GET /api/v1/models/custom
Authorization: Bearer aira_live_xxxxxReturns all custom models registered by your organization.
Get Model
GET /api/v1/models/custom/{id}
Authorization: Bearer aira_live_xxxxxReturns a single custom model by ID.
Update Model
PUT /api/v1/models/custom/{id}
Authorization: Bearer aira_live_xxxxxRequest Body
All fields are optional — only include fields you want to change.
| Field | Type | Description |
|---|---|---|
name | string | Updated display name |
endpoint_url | string | Updated HTTPS endpoint |
auth_header | string | Updated auth header |
request_schema | object | Updated request format |
response_schema | object | Updated response parsing |
timeout_ms | integer | Updated timeout (1,000-120,000) |
Example Request
curl -X PUT https://api.airaproof.com/api/v1/models/custom/cm_01J8X... \
-H "Authorization: Bearer aira_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"endpoint_url": "https://api.acme.com/v2/chat/completions",
"timeout_ms": 60000
}'Delete Model
DELETE /api/v1/models/custom/{id}
Authorization: Bearer aira_live_xxxxxRemoves the custom model. Returns 204 No Content.
Test Model
POST /api/v1/models/custom/{id}/test
Authorization: Bearer aira_live_xxxxxSends a test request to the model endpoint and validates the response format.
Response (200 OK)
{
"status": "ok",
"latency_ms": 1250,
"response_valid": true,
"structured_output": {
"decision": "APPROVE",
"confidence": 0.85,
"key_factors": ["test factor"],
"reasoning": "Test reasoning response"
},
"error": null,
"request_id": "req_01J8X..."
}If the endpoint is unreachable or returns invalid output:
{
"status": "error",
"latency_ms": 30000,
"response_valid": false,
"structured_output": null,
"error": "Request timed out after 30 seconds",
"request_id": "req_..."
}Test Response Fields
| Field | Type | Description |
|---|---|---|
status | string | ok if endpoint responded correctly, error otherwise |
latency_ms | integer | Round-trip time in milliseconds |
response_valid | boolean | Whether the response matches the expected schema |
structured_output | object|null | Parsed decision output (if valid) |
error | string|null | Error message (if failed) |
Always test a model after registering or updating it. Models with error status can still be used in cases, but may cause individual model failures.
Supported Self-Hosted Models
The following open-weight models have been tested and have dedicated per-model system prompts optimized for Aira's structured output format:
| Model | Parameters | License |
|---|---|---|
| Llama 3.3 70B | 70B | Llama Community License |
| Llama 4 Scout | 17B active (MoE) | Llama Community License |
| Llama 4 Maverick | 17B active (MoE) | Llama Community License |
| Mistral Large 3 | 41B active (MoE) | Apache 2.0 |
| Qwen 3.5 | 397B (MoE) | Apache 2.0 |
| DeepSeek V3.2 | 37B active (MoE) | MIT |
Other OpenAI-compatible models can still be registered as custom models but will use generic prompting.
vLLM Setup Example
vLLM is the recommended serving engine. It supports guided_json constrained decoding, which guarantees valid structured output from any model.
# Serve a model with vLLM
vllm serve meta-llama/Llama-4-Scout-17B-16E-Instruct \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 4 \
--enable-auto-tool-choiceThen register it:
curl -X POST https://api.airaproof.com/api/v1/models/custom \
-H "Authorization: Bearer aira_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"name": "Llama 4 Scout (vLLM)",
"model_id": "llama-4-scout",
"endpoint_url": "https://your-gpu-server:8000/v1/chat/completions",
"timeout_ms": 60000
}'When using constrained decoding, ensure your vLLM version supports guided_json (v0.6.0+). Older versions may fall back to unconstrained generation.